Genetic variants on chr 5p12 and 10q26 as markers for use in breast cancer risk assessment, diagnosis, prognosis and treatment

ABSTRACT

The invention pertains to certain genetic variants on Chr5p12 and Chr10q26 as susceptibility variants of breast cancer. Methods of disease management, including diagnosing increased and/or decreased susceptibility to breast cancer, methods of predicting response to therapy and methods of predicting prognosis using such variants are described. The invention further relates to kits useful in the methods of the invention.

BACKGROUND OF THE INVENTION

Breast cancer is by far the most common cancer in women worldwide.Current global incidence is in excess of 1,151,000 new cases diagnosedeach year [Parkin, et al., (2005), CA Cancer J Clin, 55, 74-108]. Breastcancer incidence is highest in developed countries, particularly amongstpopulations of Northern European ethnic origin, and is increasing. Inthe United States the annual age-standardized incidence rate isapproximately 125 cases per 100,000 population, more than three timesthe world average. Rates in Northern European countries are similarlyhigh. In the year 2008 it is estimated that 184,450 new cases ofinvasive breast cancer will be diagnosed in the U.S.A. and 40,930 peoplewill die from the disease [Jemal, et al., (2008), CA Cancer J Clin, 58,71-96]. To this figure must be added a further 67,770 ductal and lobularcarcinoma in-situ diagnoses expected in 2008. From an individualperspective, the lifetime probability of developing breast cancer is12.3% in U.S. women (i.e., 1 in 8 women will develop breast cancerduring their lives). As with most cancers, early detection andappropriate treatment are important factors. Overall, the 5-yearsurvival rate for breast cancer is 89%. However, in individualspresenting with regionally Invasive or metastatic disease, the ratedeclines to 84% and 27%, respectively [Jemal, et al., (2008), CA CancerJ Clin, 58, 71-96].

Increasingly, emphasis is falling on the identification individuals whoare at high risk for primary or recurrent breast cancer. Suchindividuals can be managed by more intensive screening, preventativechemotherapies, hormonal therapies and, in cases of individuals atextremely high risk, prophylactic surgery. Mass screening programsconstitute a huge economic burden on health services, while preventativetherapies have associated risks and quality of life consequences.

Genetic Predisposition to Breast Cancer

The two primary classes of known risk factors for breast cancer areendocrine factors and genetics. Regarding the latter, approximately 12%of breast cancer patients have one or more first degree relatives withbreast cancer [(2001), Lancet, 358, 1389-99]. The well known, dominantbreast cancer predisposition genes BRCA1 and BRCA2 confer greatlyincreased breast cancer risk to carriers, with lifetime penetranceestimates ranging from 40-80%. The presence of BRCA1 and BRCA2 mutationscan account for the majority of families with 6 or more cases of breastcancer and for a large proportion of families comprising breast andovarian or male breast cancer. However such families are very rareindeed. BRCA1 and BRCA2 mutations are found much less frequently infamilies with fewer cases or in families characterized by breast cancercases only. Together, mutations in BRCA1 and BRCA2 can account for15-20% of the risk for familial breast cancer. In non-founderpopulations, if all common BRCA mutations could be detected, between2-3% of incident breast cancer patients would be expected to harbor amutation [Gorski, et al., (2005), Breast Cancer Res Treat, 92, 19-24;(2000), Br J Cancer, 83, 1301-8]. This low “chance to find” statisticprecludes the responsible use of BRCA mutation testing outside familieswith an obvious hereditary predisposition (Anon[(2003), J Clin Oncol,21, 2397-406]). Rare, high penetrance mutations are known to occur inthe TP53 and PTEN genes, however, these together account for no morethan 5% of the total genetic risk for breast cancer [Easton, (1999),Breast Cancer Res, 1, 14-7]. Linkage studies have been largelyunsuccessful in identifying any more, widespread mutations conferringhigh risk for breast cancer[Smith, et al., (2006), Genes ChromosomesCancer, 45, 646-55].

Recent epidemiological studies have indicated that the majority ofbreast cancer cases arise in a predisposed, susceptible minority of thepopulation [Antoniou, et al., (2002), Br J Cancer, 86, 76-83; Pharoah,et al., (2002), Nat Genet, 31, 33-6]. Data from twin studies andobservations of the constant, high incidence of cancer in thecontralateral breast of patients surviving primary breast cancerindicate that a substantial portion of the uncharacterized risk forbreast cancer is related to endogenous factors, most probably genetic[Lichtenstein, et al., (2000), N Engl J Med, 343, 78-85; Peto and Mack,(2000), Nat Genet, 26, 411-4]. Knowledge of the genetic factors thatunderpin this widespread risk is very limited. Segregation analysespredict that the uncharacterized genetic risk for breast cancer is mostlikely to be polygenic in nature, with risk alleles that confer low tomoderate risk and which may interact with each other and with hormonalrisk factors. Nevertheless, these studies predict as much as 40-folddifferences in relative risk between the highest and lowest quintiles ofa distribution that could be defined by genetic profiling that capturesthese low to moderate risk alleles [Antoniou, et al., (2002), Br 3Cancer, 86, 76-83; Pharoah, et al., (2002), Nat Genet, 31, 33-6]. 88% ofall breast cancer cases are expected to arise amongst a predisposed 50%of the population and the 12% of the population at highest risk accountsfor 50% of all breast cancer cases [Pharoah, et al., (2002), Nat Genet,31, 33-6; Pharoah, (2003), Recent Results Cancer Res, 163, 7-18;discussion 264-6]. Much focus is therefore directed towards theidentification of such genetically predisposed individuals anddeveloping personalized medical management strategies for them.

We and others have shown that there is a significant familial risk ofbreast cancer in Iceland which extends to at least 5^(th) degreerelatives [Amundadottir, et al., (2004), PLoS Med, 1, e65; Tulinius, etal., (2002), J Med Genet, 39, 457-62]. The contribution of BRCA1mutations to familial risk in Iceland is thought to be minimal [Arason,et al., (1998), J Med Genet, 35, 446-9; Bergthorsson, et al., (1998),Hum Mutat, Suppl 1, S195-7]. A single founder mutation in the BRCA2 gene(999del5) is present at a carrier frequency of 0.6-0.8% in the generalIcelandic population and 7.7-8.6% in female breast cancer patients[Thorlacius, et al., (1997), Am J Hum Genet, 60, 1079-84; Gudmundsson,et al., (1996), Am J Hum Genet, 58, 749-56]. This single mutation isestimated to account for approximately 40% of the inherited breastcancer risk to first through third degree relatives [Tulinius, et al.,(2002), J Med Genet, 39, 457-62]. Although this estimate is higher thanthe 15-25% of familial risk attributed to all BRCA 1 and 2 mutationscombined in non-founder populations, there is still some 60% ofIcelandic familial breast cancer risk to be explained. First degreerelatives of patients who test negative for BRCA2 999de15 remain at a1.72 fold the population risk for breast cancer (95% CI 1.49-1.96)[Tulinius, et al., (2002), J Med Genet, 39, 457-62].

Genetic risk Is conferred by subtle differences in the genome amongindividuals within a population. Genes differ between individuals mostfrequently due to single nucleotide polymorphisms (SNP), although othervariations are also important. SNP are located on average every 1000base pairs in the human genome. Accordingly, a typical human genecontaining 250,000 base pairs may contain 250 different SNP. Only aminor number of SNPs are located in exons and alter the amino acidsequence of the protein encoded by the gene. Most SNPs may have littleor no effect on gene function, while others may alter transcription,splicing, translation, or stability of the mRNA encoded by the gene.Additional genetic polymorphism in the human genome is caused byinsertion, deletion, translocation, or inversion of either short or longstretches of DNA. Genetic polymorphisms conferring disease risk maytherefore directly alter the amino acid sequence of proteins, mayincrease the amount of protein produced from the gene, or may decreasethe amount of protein produced by the gene.

As genetic polymorphisms conferring risk of common disease areuncovered, genetic testing for such risk factors becomes important forclinical medicine. Recent examples are apolipoprotein E testing toidentify genetic carriers of the apoE4 polymorphism in dementia patientsfor the differential diagnosis of Alzheimer's disease, and of Factor VLeiden testing for predisposition to deep venous thrombosis. Moreimportantly, in the treatment of cancer, diagnosis of genetic variantsin tumor cells is used for the selection of the most appropriatetreatment regime for the individual patient. In breast cancer, geneticvariation in estrogen receptor expression or heregulin type 2 (Her2)receptor tyrosine kinase expression determine if anti-estrogenic drugs(tamoxifen) or anti-Her2 antibody (Herceptin) will be incorporated intothe treatment plan. In chronic myeloid leukemia (CML) diagnosis of thePhiladelphia chromosome genetic translocation fusing the genes encodingthe Bcr and Abl receptor tyrosine kinases indicates that Gleevec(STI571), a specific inhibitor of the Bcr-Abl kinase should be used fortreatment of the cancer. For CML patients with such a geneticalteration, inhibition of the Bcr-Abl kinase leads to rapid eliminationof the tumor cells and remission from leukemia.

Understanding of the genetic factors contributing to the residualgenetic risk for breast cancer is limited. Variants in two genes havebeen rigorously confirmed as low penetrance breast cancer risk genes;CHEK2 and ATM [Renwick, et al., (2006), Nat Genet, 38, 873-5; (2004), AmJ Hum Genet, 74, 1175-82]. Furthermore, a recent report establishes alink between variants on chromosomes 2q35 and 16q12 and increased riskof estrogen receptor positive breast cancer (Simon, S N. et al. NatGenet 39:865-9 (2007)). Many other genes have been implicated howevertheir contribution to breast cancer risk has not been confirmed inanalyses employing very large sample sets [Breast Cancer Association,(2006), J Natl Cancer Inst, 98, 1382-96].

No universally successful method for the prevention or treatment ofbreast cancer is currently available. Management of breast cancercurrently relies on a combination of primary prevention, earlydiagnosis, appropriate treatments and secondary prevention. There areclear clinical imperatives for integrating genetic testing into allaspects of these management areas. Identification of cancersusceptibility genes may also reveal key molecular pathways that may bemanipulated (e.g., using small or large molecular weight drugs) and maylead to more effective treatments.

SUMMARY OF THE INVENTION

The present invention relates to methods of assessing a susceptibilityto breast cancer. The invention includes methods of diagnosing anincreased susceptibility to breast cancer, as well as methods ofdiagnosing a decreased susceptibility to breast cancer or diagnosing aprotection against cancer, by evaluating certain markers or haplotypesthat have been found to be associated with increased or decreasedsusceptibility of breast cancer. The invention also relates to methodsof assessing prognosis of individuals diagnosed with breast cancer,methods of assessing the probability of response to a breast cancertherapeutic agent or breast cancer therapy, as well as methods ofmonitoring progress of treatment of an individual diagnosed with breastcancer.

In one aspect, the present invention relates to a method of diagnosing asusceptibility to breast cancer in a human individual, the methodcomprising determining the presence or absence of at least one allele ofat least one polymorphic marker on chromosome 5p12 or on chromosome10q26 in a nucleic acid sample obtained from the individual, wherein thepresence of the at least one allele is indicative of a susceptibility tobreast cancer. The invention also relates to a method of determining asusceptibility to breast cancer, by determining the presence or absenceof at least one allele of at least one polymorphic marker on chromosome5p12 or on chromosome 10q26 in a nucleic acid sample from theindividual, wherein the determination of the presence of the at leastone allele is indicative of a susceptibility to breast cancer.

In another aspect, the invention relates to a method of determining asusceptibility to breast cancer in a human individual, comprisingdetermining whether at least one at-risk allele in at least onepolymorphic marker is present in a genotype dataset derived from theindividual, wherein the at least one polymorphic marker is selected frommarkers within chromosome 5p12, and wherein determination of thepresence of the at least one at-risk allele is indicative of increasedsusceptibility to breast cancer in the individual.

The invention furthermore relates to a method for determining asusceptibility to breast cancer in a human individual, comprisingdetermining whether at least one allele of at least one polymorphicmarker is present in a nucleic acid sample obtained from the individualor in a genotype dataset derived from the individual, wherein the atleast one polymorphic marker is selected from rs10941679 (SEQ IDNO:236), rs4415084 (SEQ ID NO:235), and rs1219648 (SEQ ID NO:237), andmarkers in linkage disequilibrium therewith, and wherein the presence ofthe at least one allele is indicative of a susceptibility to breastcancer for the individual.

The genotype dataset comprises in one embodiment information aboutmarker identity, and the allelic status of the individual, i.e.information about the identity of the two alleles carried by theindividual for the marker. The genotype dataset may comprise allelicinformation about one or more marker, including two or more markers,three or more markers, five or more markers, one hundred or moremarkers, etc. In some embodiments, the genotype dataset comprisesgenotype information from a whole-genome assessment of the individual,that may include hundreds of thousands of markers, or even one millionor more markers.

In certain embodiments, the at least one polymorphic marker isassociated with the FGF10 gene, the HCN1 gene, the MRPS30 gene, and/orthe FGFR2 gene. In certain such embodiments, the at least onepolymorphic marker is in linkage disequilibrium with the FGF10 gene, theHCN1 gene, the MRPS30 gene, and/or the FGFR2 gene. In certain otherembodiments, the at least one polymorphic marker is selected from thegroup of markers located within the chromosomal segment spanningposition 44,666,047 and 44,976,797, in NCBI Build 34, and markers inlinkage disequilibrium therewith. In another embodiment, the at leastone polymorphic marker is selected from the group consisting of thepolymorphic markers listed in Table 1 and Table 3, and markers inlinkage disequilibrium therewith.

In certain embodiments, the at least one polymorphic marker is selectedfrom the markers set forth in Table 12, Table 13 and Table 14. In oneembodiment, the at least one polymorphic marker is selected from themarkers as set forth in SEQ ID NO:1-237. In one embodiment, the markersin linkage disequilibrium with marker rs4415084 are selected from themarkers set forth in Table 12. In another embodiment, the markers inlinkage disequilibrium with marker rs10941679 are selected from themarkers set forth in Table 13. In another embodiment, the markers inlinkage disequilibrium with marker rs1219648 are selected from themarkers set forth in Table 14.

In certain embodiments, a further step of assessing the frequency of atleast one haplotype in the indiviudal is performed. In such embodiments,two or more markers, including three, four, five, six, seven, eight,nine or ten or more markers can be included in the haplotype. In oneembodiment, the haplotype comprises markers in the chromosome 5p12region. In another embodiment, the haplotype comprises markers in thechromosome 10q26 region. In certain embodiments, the haplotype comprisesmarkers in linkage disequilibrium with rs4415084. In certain otherembodiments, the haplotype comprises markers in linkage disequilibriumwith rs10941679. In certain other embodiments, the haplotype comprisesmarkers in linkage disequilibrium with rs1219648.

The markers conferring risk of breast cancer, as described herein, canbe combined with other genetic markers for breast cancer. Thus, incertain embodiments, a further step is included, comprising determiningwhether at least one at-risk allele of at least one at-risk variant forbreast cancer not in linkage disequilibrium with any one of the markersset forth in Table 12, Table 13 and Table 14 is present in a samplecomprising genomic DNA from a human individual or a genotype datasetderived from a human individual. In other words, genetic markers inother locations in the genome can be useful in combination with themarkers of the present invention, so as to determine overall risk ofbreast cancer based on multiple genetic factors. Selection of markersthat are not in linkage disequilibrium (not in LD) can be based on asuitable measure for linkage disequilibrium, as described furtherherein. In certain embodiments, markers that are not in linkagedisequilibrium have values for the LD measure r² between the markers ofless than 0.2. In certain other embodiments, markers that are not in LDhave values for r² between the markers of less than 0.15, including lessthan 0.10, less than 0.05, less than 0.02 and less than 0.01. Othersuitable cutoff values for establishing that markers are not in LD arecontemplated, including values bridging any of these values.

In certain embodiments, multiple markers as described herein aredetermined to determine overall risk of breast cancer. Thus, in certainembodiments, an additional step is included, the step comprisingdetermining whether at least one allele in each of at least twopolymorphic markers is present in a sample comprising genomic DNA from ahuman individual or a genotype dataset derived from a human individual,wherein the presence of the at least one allele in the at least twopolymorphic markers is indicative of an increased susceptibility tobreast cancer. In one embodiment, the markers are selected fromrs4415084 (SEQ ID NO:235), rs10941679 (SEQ ID NO:236) and rs1219648 (SEQID NO:237), and markers in linkage disequilibrium therewith.

Risk assessment based on the markers of the present invention can alsobe combined with assessment for the presence of absence of at least onehigh penetrant genetic factor for breast cancer in a nucleic acid sampleobtained from the individual or in a genotype dataset derived from theindividual. The high penetrant genetic factor for breast cancer can forexample be a BRCA1 mutation, a BRCA2 mutation, a TP53 mutation or a PTENmutation. Together, mutations in BRCA1 and BRCA2 can account for 15-20%of the risk for familial breast cancer, and these can account forbetween 2-3% of incident breast cancer patients [Gorski, et al., (2005),Breast Cancer Res Treat, 92, 19-24; (2000), Br J Cancer, 83, 1301-8].Known mutations in the TP53 and PTEN genes account for about 5% of thetotal genetic risk for breast cancer [Easton, (1999), Breast Cancer Res,1, 14-7]. In one embodiment, the high penetrant genetic factor is BRCA2999de15.

The genetic markers of the invention can also be combined withnon-genetic information to establish overall risk for an individual.Thus, in certain embodiments, a further step is included, comprisinganalyzing non-genetic information to make risk assessment, diagnosis, orprognosis of the individual. The non-genetic information can be anyinformation pertaining to the disease status of the indiviudal or otherinformation that can influence the estimate of overall risk of breastcancer for the individual. In one embodiment, the non-geneticinformation is selected from age, gender, ethnicity, socioeconomicstatus, previous disease diagnosis, medical history of subject, familyhistory of breast cancer, biochemical measurements, and clinicalmeasurements.

In another aspect, the invention relates to a method of assessing riskof developing at least a second primary tumor in an individualpreviously diagnosed with breast cancer, the method comprisingdetermining the presence or absence of at least one allele of at leastone polymorphic marker in a nucleic acid sample obtained from theindividual, wherein the at least one polymorphic marker is selected fromthe group consisting of the polymorphic markers listed in Tables 12, 13and 14, and markers in linkage disequilibrium therewith, wherein thepresence of the at least one allele is indicative of risk of developingat least a second primary tumor. Alternatively, the invention relates toa method of determining risk of developing at least a second primarytumor in an individual previously diagnosed with breast cancer, themethod comprising determining whether at least one allele of at leastone polymorphic marker is present in a nucleic acid sample obtained fromthe individual, or in a genotype dataset derived from the individual,wherein the at least one polymorphic marker is selected from rs10941679(SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648 (SEQ IDNO:237, and markers in linkage disequilibrium therewith, and wherein thepresence of the at least one allele is indicative of risk of developingat least a second primary tumor. In one such embodiment, the at leastone polymorphic marker is selected from the markers set forth in Table12, Table 13 and Table 14.

The invention also relates to an apparatus for determining a geneticindicator for breast cancer in a human individual, comprising: acomputer readable memory; and a routine stored on the computer readablememory; wherein the routine is adapted to be executed on a processor toanalyze marker and/or haplotype information for at least one humanindividual with respect to at least one polymorphic marker selected fromrs10941679 (SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648(SEQ ID NO:237, and markers in linkage disequilibrium therewith, andgenerate an output based on the marker or haplotype information, whereinthe output comprises an individual risk measure of the at least onemarker or haplotype as a genetic indicator of breast cancer for thehuman individual. In one embodiment, the at least one polymorphic markeris selected from the markers set forth in Table 12, Table 13 and Table14. In one embodiment, the routine further comprises a risk measure forbreast cancer associated with the at least one marker allele and/orhaplotype, wherein the risk measure is based on a comparison of thefrequency of at least one allele of at least one polymorphic markerand/or haplotype in a plurality of individuals diagnosed with breastcancer and an indicator of the frequency of the at least one allele ofat least one polymorphic marker and/or haplotype in a plurality ofreference individuals, and wherein the individual risk for the humanindividual is based on a comparison of the carrier status of theindividual for the at least one marker allele and/or haplotype and therisk measure for the at least one marker allele and/or haplotype. Forexample, the risk measure may in certain embodiments be a measure ofrisk conferred by each copy of an at-risk variant for breast cancer in apopulation of individuals with breast cancer, compared with controls.Based on such reference data, risk for a particular individual can beestimated, by determining his/her genotype status at the particularmarker and calculate a risk for the individual based thereupon. If theindividual carries one copy of the genetic risk variant in his/hergenome, the calculated risk can be based on the risk conferred by asingle copy of the risk variant. If the individual carries two copies ofthe genetic risk variants, i.e. the individual is homozygous for theat-risk variant, then the risk estimate for the individual can be basedon the risk based on a group of individuals, compared with controls.Normally, risk for homozygous carriers will be the risk for a singlecopy of the variant squared. Other methods for reporting or estimatingrisk for the indiviudal based on genotype status at particular markersare also possible, and within the scope of the present invention.

In another aspect, the invention relates to a method of identificationof a marker for use in assessing susceptibility to breast cancer, themethod comprising: identifying at least one polymorphic marker inlinkage disequilibrium with at least one of rs10941679 (SEQ ID NO:236),rs4415084 (SEQ ID NO:235), and rs1219648 (SEQ ID NO:237); determiningthe genotype status of a sample of individuals diagnosed with, or havinga susceptibility to, breast cancer; and determining the genotype statusof a sample of control individuals; wherein a significant difference infrequency of at least one allele in at least one polymorphism inindividuals diagnosed with, or having a susceptibility to, breastcancer, as compared with the frequency of the at least one allele in thecontrol sample is indicative of the at least one polymorphism beinguseful for assessing susceptibility to breast cancer. Significantdifference can be estimated on statistical analysis of allelic counts atcertain polymorphic markers in breast cancer patients and controls. Inone embodiment, a significant difference is based on a calcuated P-valuebetween breast cancer patients and controls of less than 0.05. In oneembodiment, an increase in frequency of the at least one allele in theat least one polymorphism in individuals diagnosed with, or having asusceptibility to, breast cancer, as compared with the frequency of theat least one allele in the control sample is indicative of the at leastone polymorphism being useful for assessing increased susceptibility tobreast cancer. In another embodiment, a decrease in frequency of the atleast one allele in the at least one polymorphism in individualsdiagnosed with, or having a susceptibility to, breast cancer, ascompared with the frequency of the at least one allele in the controlsample is indicative of the at least one polymorphism being useful forassessing decreased susceptibility to, or protection against, breastcancer.

The invention also relates to a method of genotyping a nucleic acidsample obtained from a human individual comprising determining whetherat least one allele of at least one polymorphic marker is present in anucleic acid sample from the individual sample, wherein the at least onemarker is selected from rs10941679 (SEQ ID NO:236), rs4415084 (SEQ IDNO:235), and rs1219648 (SEQ ID NO:237, and markers in linkagedisequilibrium therewith, and wherein determination of the presence ofthe at least one allele in the sample is indicative of a susceptibilityto breast cancer in the individual. In one embodiment, determination ofthe presence of allele T in rs4415084 (SEQ ID NO:235), allele G inrs10941679 (SEQ ID NO:236) and/or allele G in rs1219648 (SEQ ID NO:237)is indicative of increased suscepbtibility of breast cancer in theindividual. In one embodiment, genotyping comprises amplifying a segmentof a nucleic acid that comprises the at least one polymorphic marker byPolymerase Chain Reaction (PCR), using a nucleotide primer pair flankingthe at least one polymorphic marker. In another embodiment, genotypingis performed using a process selected from allele-specific probehybridization, allele-specific primer extension, allele-specificamplification, nucleic acid sequencing, 5′-exonuclease digestion,molecular beacon assay, oligonucleotide ligation assay, size analysis,single-stranded conformation analysis and microarray technology. In oneembodiment, the microarray technology is Molecular Inversion Probe arraytechnology or BeadArray Technologies. In one embodiment, the processcomprises allele-specific probe hybridization. In another embodiment,the process comprises microrray technology. One preferred embodimentcomprises the steps of (1) contacting copies of the nucleic acid with adetection oligonucleotide probe and an enhancer oligonucleotide probeunder conditions for specific hybridization of the oligonucleotide probewith the nucleic acid; wherein (a) the detection oligonucleotide probeis from 5-100 nucleotides in length and specifically hybridizes to afirst segment of a nucleic acid whose nucleotide sequence is given byany one of SEQ ID NO:1-237; (b) the detection oligonucleotide probecomprises a detectable label at its 3′ terminus and a quenching moietyat its 5′ terminus; (c) the enhancer oligonucleotide is from 5-100nucleotides in length and is complementary to a second segment of thenucleotide sequence that is 5′ relative to the oligonucleotide probe,such that the enhancer oligonucleotide is located 3′ relative to thedetection oligonucleotide probe when both oligonucleotides arehybridized to the nucleic acid; and (d) a single base gap exists betweenthe first segment and the second segment, such that when theoligonucleotide probe and the enhancer oligonucleotide probe are bothhybridized to the nucleic acid, a single base gap exists between theoligonucleotides; (2) treating the nucleic acid with an endonucleasethat will cleave the detectable label from the 3′ terminus of thedetection probe to release free detectable label when the detectionprobe is hybridized to the nucleic acid; and (3) measuring freedetectable label, wherein the presence of the free detectable labelindicates that the detection probe specifically hybridizes to the firstsegment of the nucleic acid, and indicates the sequence of thepolymorphic site as the complement of the detection probe.

A further aspect of the invention pertains to a method of assessing anindividual for probability of response to a breast cancer therapeuticagent, comprising: determining whether at least one allele of at leastone polymorphic marker is present in a nucleic acid sample obtained fromthe individual, or in a genotype dataset derived from the individual,wherein the at least one polymorphic marker is selected from rs10941679(SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648 (SEQ IDNO:237, and markers in linkage disequilibrium therewith, wherein thepresence of the at least one allele of the at least one marker isindicative of a probability of a positive response to the therapeuticagent

The invention in another aspect relates to a method of predictingprognosis of an individual diagnosed with breast cancer, the methodcomprising determining whether at least one allele of at least onepolymorphic marker is present in a nucleic acid sample obtained from theindividual, or in a genotype dataset derived from the individual,wherein the at least one polymorphic marker is selected from rs10941679(SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648 (SEQ IDNO:237, and markers in linkage disequilibrium therewith, wherein thepresence of the at least one allele is indicative of a worse prognosisof the breast cancer in the individual.

Yet another aspect of the invention relates to a method of monitoringprogress of treatment of an individual undergoing treatment for breastcancer, the method comprising determining whether at least one allele ofat least one polymorphic marker is present in a nucleic acid sampleobtained from the individual, or in a genotype dataset derived from theindividual, wherein the at least one polymorphic marker is selected fromrs10941679 (SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648(SEQ ID NO:237), and markers in linkage disequilibrium therewith,wherein the presence of the at least one allele is indicative of thetreatment outcome of the individual. In one embodiment, the treatment istreatment by surgery, treatment by radiation therapy, or treatment bydrug administration.

The invention also relates to the use of an oligonucleotide probe in themanufacture of a reagent for diagnosing and/or assessing susceptibilityto breast cancer in a human Individual, wherein the probe hybridizes toa segment of a nucleic acid with nucleotide sequence as set forth in anyone of SEQ ID NO:1-237, wherein the probe is 15-500 nucleotides inlength. In certain embodiments, the probe is about 16 to about 100nucleotides in length. In certain other embodiments, the probe is about20 to about 50 nucleotides in length. In certain other embodiments, theprobe is about 20 to about 30 nucleotides in length.

The invention also relates to computer-readable media. In one aspect,the invention relates to a medium on which is stored: an identifier forat least one polymorphic marker; an indicator of the frequency of atleast one allele of said at least one polymorphic marker in a pluralityof individuals diagnosed with breast cancer; and an indicator of thefrequency of the least one allele of said at least one polymorphicmarkers in a plurality of reference individuals; wherein the at leastone polymorphic marker is selected rs10941679 (SEQ ID NO:236), rs4415084(SEQ ID NO:235), and rs1219648 (SEQ ID NO:237, and polymorphic markersin linkage disequilibrium therewith. In one embodiment, the polymorphicmarker is selected from the markers set forth in Table 12, Table 13 andTable 14. In another embodiment, the medium further comprisesinformation about the ancestry of the plurality of individuals.

Various diagnoses and categories of the breast cancer phenotype arewithin scope of the present invention. In its broadest sense, theinvention relates to any breast cancer phenotype. Breast cancer, incertain embodiments, includes any clinical diagnosis of breast cancer,including, but not limited to: invasive ductal, invasive lobular,tubular, or as otherwise invasive or mixed invasive, medullary, DCIS(Ductal Carcinoma In-Situ), LCIS (Lobular Carcinoma In-Situ), orotherwise non-invasive; Invasive breast cancer, including stage 0, stage1, stage 2 (including stage 2a and stage 2b), stage 3 (including stage3a, stage 3b and stage 3c) and stage 4 breast cancer. In certainembodiments, the breast cancer phenotype is selected from All BreastCancer, Multiple Primary Breast Cancer, and early onset Breast Cancer.In some embodiments, the markers of the invention are associated withrisk of breast cancer in individuals with a family history of breastcancer. In one such embodiment, the summed family history (FHS) is thephenotype associated with breast cancer. In another embodiment, thebreast cancer associated with the variants of the invention is estrogenreceptor (ER) positive and/or progesterone receptor (PR) positive breastcancer. In one embodiment, the breast cancer associated with thevariants of the invention is estrogen receptor (ER) positive. In anotherembodiment, the breast cancer associated with the variants of theinvention is progesterone receptor (ER) positive. In one suchembodiment, the markers described herein to be associated with increasedrisk or susceptibility of breast cancer confer increased risk orsusceptibility of ER-positive and/or PR-positive breast cancer. Thus, incertain embodiments, presence of at least one of the at-risk variants ofthe invention is predictive of ER positive or PR positive breast cancerin the individual.

In some embodiments of the methods of the invention, the susceptibilitydetermined in the method is increased susceptibility. In one suchembodiment, the increased susceptibility is characterized by a relativerisk (RR) of at least 1.10. In another embodiment, the increasedsusceptibility is characterized by a relative risk of at least 1.20. Inanother embodiment, the increased susceptibility is characterized by arelative risk of at least 1.30. In another embodiment, the increasedsusceptibility is characterized by a relative risk of at least 1.40. Inyet another embodiment, the increased susceptibility is characterized bya relative risk of at least 1.50. In a further embodiment, the increasedsusceptibility is characterized by a relative risk of at least 1.70. Inyet another embodiment, the increased susceptibility is characterized bya relative risk of at least 2.0. Other embodiments are characterized byrelative risk of at least 1.10, 1.11, 1.12, 1.13, 1.14, 1.15, 1.16,1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, 1.24, 1.25, 1.26, 1.27, 1.28,1.29, 1.30, 1.31, 1.32, 1.33, 1.34, 1.35. Other numberic values for riskbridging any of these above-mentioned values are also possible, andthese are also within scope of the invention.

In some embodiments of the methods of the invention, the susceptibilitydetermined in the method is decreased susceptibility. In one suchembodiment, the decreased susceptibility is characterized by a relativerisk (RR) of less than 0.9. In another embodiment, the decreasedsusceptibility is characterized by a relative risk (RR) of less than0.8. In another embodiment, the decreased susceptibility ischaracterized by a relative risk (RR) of less than 0.7. In yet anotherembodiment, the decreased susceptibility is characterized by a relativerisk (RR) of less than 0.5. Other cutoffs, such as relative risk of lessthan 0.89, 0.88, 0.87, 0.86, 0.85, 0.84, 0.83, 0.82, 0.81, 0.80, 0.79,0.78, 0.77, 0.76, 0.75, 0.74, 0.73, 0.72, 0.71, 0.70, and so on, arewithin scope of the invention.

The invention also relates to kits. In one such aspect, the inventionrelates to a kit for assessing susceptibility to breast cancer in ahuman individual, the kit comprising reagents necessary for selectivelydetecting at least one allele of at least one polymorphic marker onchromosome 5p12 or 10q26 in the genome of the individual, wherein thepresence of the at least one allele is indicative of increasedsusceptibility to breast cancer. In another aspect, the inventionrelates to a kit for assessing susceptibility to breast cancer in ahuman individual, the kit comprising reagents for selectively detectingat least one allele of at least one polymorphic marker in the genome ofthe individual, wherein the polymorphic marker is selected fromrs10941679 (SEQ ID NO:236), rs4415084 (SEQ ID NO:235), and rs1219648(SEQ ID NO:237, and markers in linkage disequilibrium therewith, andwherein the presence of the at least one allele is indicative of asusceptibility to breast cancer. In one embodiment, the at least onepolymorphic marker is selected from the markers set forth in Table 12,Table 13 and Table 14.

Kit reagents may in one embodiment comprise at least one contiguousoligonucleotide that hybridizes to a fragment of the genome of theindividual comprising the at least one polymorphic marker. In anotherembodiment, the kit comprises at least one pair of oligonucleotides thathybridize to opposite strands of a genomic segment obtained from thesubject, wherein each oligonucleotide primer pair is designed toselectively amplify a fragment of the genome of the individual thatincludes one polymorphism, wherein the polymorphism is selected from thegroup consisting of the polymorphisms as defined in Tables 12, 13 and14, and wherein the fragment is at least 20 base pairs in size. In oneembodiment, the oligonucleotide is completely complementary to thegenome of the individual. In another embodiment, the kit furthercontains buffer and enzyme for amplifying said segment. In anotherembodiment, the reagents further comprise a label for detecting saidfragment.

In one preferred embodiment, the kit comprises: a detectionoligonucleotide probe that is from 5-100 nucleotides in length; anenhancer oligonucleotide probe that is from 5-100 nucleotides in length;and an endonuclease enzyme; wherein the detection oligonucleotide probespecifically hybridizes to a first segment of the nucleic acid whosenucleotide sequence is set forth in any one of SEQ ID NO:1-237, andwherein the detection oligonucleotide probe comprises a detectable labelat its 3′ terminus and a quenching moiety at its 5′ terminus; whereinthe enhancer oligonucleotide is from 5-100 nucleotides in length and iscomplementary to a second segment of the nucleotide sequence that is 5′relative to the oligonucleotide probe, such that the enhanceroligonucleotide is located 3′ relative to the detection oligonucleotideprobe when both oligonucleotides are hybridized to the nucleic acid;wherein a single base gap exists between the first segment and thesecond segment, such that when the oligonucleotide probe and theenhancer oligonucleotide probe are both hybridized to the nucleic acid,a single base gap exists between the oligonucleotides; and whereintreating the nucleic acid with the endonuclease will cleave thedetectable label from the 3′ terminus of the detection probe to releasefree detectable label when the detection probe is hybridized to thenucleic acid.

Kits according to the present invention may also be used in the othermethods of the invention, including methods of assessing risk ofdeveloping at least a second primary tumor in an individual previouslydiagnosed with breast cancer, methods of assessing an individual forprobability of response to a breast cancer therapeutic agent, andmethods of monitoring progress of a treatment of an individual diagnosedwith breast cancer and given a treatment for the disease.

The markers that are described herein to be associated with breastcancer can all be used in the various aspects of the invention,including the methods, kits, uses, apparatus, procedures describedherein. In certain embodiments, the invention relates to use of markerswithin chromosome 5p12. In certain other embodiments, the inventionrelates to markers within chromosome 10q26. In certain embodiments, theinvention relates to the markers set forth in Table 1 or Table 3, andmarkers in linkage disequilibrium therewith. In certain otherembodiments, the invention relates to the markers set forth in Table 3.In certain other embodimens, the invention relates marker rs10941679,rs7703618, rs4415084, rs2067980, rs10035564, rs11743392, rs7716600, andrs1219648, and markers in linkage disequilibrium therewith. In somepreferred embodiments, the invention relates to markers rs4415084,rs10941679 and rs1219648, and markers in linkage disequilibriumtherewith. In some other preferred embodiments, the invention relates tomarkers as set forth in Table 12, Table 13 and Table 14 herein. In otherpreferred embodiments, the invention relates to rs4415084 and markers inlinkage disequilibrium therewith (e.g., markers as set forth in Table12). In other preferred embodiments, the invention relates to rs10941679and markers in linkage disequilibrium therewith (e.g., markers as setforth in Table 13). In other preferred embodiments, the inventionrelates to rs1219648 and markers in linkage disequilibrium therewith(e.g., markers as set forth in Table 14). In one embodiment, theinvention relates to marker rs4415084. In another embodiment, theinvention relates to rs10941679. In another embodiment, the inventionrelates to rs1219648.

In certain embodiments, the at least one marker allele conferringincreased risk of breast cancer is selected from of rs10941679 allele G,rs7703618 allele T, rs4415084 allele G, rs2067980 allele G, rs10035564allele G, rs11743392 allele T, rs7716600 allele A, and rs1219648 alleleG. In these embodiments, the presence of the allele (the at-risk allele)is indicative of increased risk of breast cancer.

In certain embodiments of the invention, linkage disequilbrium isdetermined using the linkage disequilibrium measures r² and |D′|, whichgive a quantitative measure of the extent of linkage disequilibrium (LD)between two genetic element (e.g., polymorphic markers). Certainnumerical values of these measures for particular markers are indicativeof the markers being in linkage disequilibrium, as described furtherherein. In one embodiment of the invention, linkage disequilibriumbetween marker (i.e., LD values indicative of the markers being inlinkage disequilibrium) is defined as r²>0.1. In another embodiment,linkage disequilibrium is defined as r²>0.2. Other embodiments caninclude other definitions of linkage disequilibrium, such as r²>0.25,r²>0.3, r²>0.35, r²>0.4, r²>0.45, r²>0.5, r²>0.55, r²>0.6, r²>0.65,r²>0.7, r²>0.75, r²>0.8, r²>0.85, r²>0.9, r²>0.95, r²>0.96, r²>0.97,r²>0.98, or r²>0.99. Linkage disequilibrium can in certain embodimentsalso be defined as |D′|>0.2, or as |D′|>0.3, |D′|>0.4, |D′|>0.5,|D′|>0.6, |D′|>0.7, |D′|>0.8, |D′|>0.9, |D′|>0.95, |D′|>0.98 or|D′|>0.99. In certain embodiments, linkage disequilibrium is defined asfulfilling two criteria of r² and |D′|, such as r²>0.2 and |D′|>0.8.Other combinations of values for r² and |D′| are also possible andwithin scope of the present invention, including but not limited to thevalues for these parameters set forth in the above.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages of theinvention will be apparent from the following more particulardescription of preferred embodiments of the invention.

FIG. 1 shows a map of Association Data on 5p12 from the Iceland 1Cohort. The upper panel shows the P-values for the association signalsderived from the Illumine Hap300 data frm the Iceland 1 cohort of 1660breast cancer patients and 11,563 controls, plotted according to theirphysical location (NCBI Build 34). The signals from the key SNPsdefining the 6 equivalence classes in the region are labelled A-F. Inthe lower panel are shown the locations of recombination hotspots,chromosome bands, exons of known genes and recombination rates. At thebottom are plotted pairwise r² values derived from HapMap Phase II data(release 19). The intensity of the dots is proportional to the magnitudeof the pairwise r² value. Recombination hotspots and recombination ratesare derived using methods described by McVean et al. 2004 (see text).

DETAILED DESCRIPTION OF THE INVENTION

The present invention discloses polymorphic variants and haplotypes thathave been found to be associated with breast cancer. Particular allelesat polymorphic markers on chromosome 5p12 have been found to beassociated with breast cancer. Such markers and haplotypes are usefulfor diagnostic purposes, for methods of predicting drug response, andmethods for predicting treatment progress, as described in furtherdetail herein. Further applications of the present invention includesmethods for assessing response to breast cancer therapy by surgery orradiation utilizing the polymorphic markers of the invention, as well askits for use in the methods of the invention.

DEFINITIONS

Unless otherwise indicated, nucleic acid sequences are written left toright in a 5′ to 3′ orientation. Numeric ranges recited within thespecification are inclusive of the numbers defining the range andinclude each integer or any non-integer fraction within the definedrange. Unless defined otherwise, all technical and scientific terms usedherein have the same meaning as commonly understood by the ordinaryperson skilled in the art to which the invention pertains.

The following terms shall, in the present context, have the meaning asindicated:

A “polymorphic marker”, sometimes referred to as a “marker”, asdescribed herein, refers to a genomic polymorphic site. Each polymorphicmarker has at least two sequence variations characteristic of particularalleles at the polymorphic site. Thus, genetic association to apolymorphic marker implies that there is association to at least onespecific allele of that particular polymorphic marker. The marker cancomprise any allele of any variant type found in the genome, includingsingle nucleotide polymorphisms (SNPs), mini- or microsatellites,translocations and copy number variations (insertions, deletions,duplications). Polymorphic markers can be of any measurable frequency inthe population. For mapping of disease genes, polymorphic markers withpopulation frequency higher than 5-10% are in general most useful.However, polymorphic markers may also have lower population frequencies,such as 1-5% frequency, or even lower frequency, in particular copynumber variations (CNVs). The term shall, in the present context, betaken to include polymorphic markers with any population frequency.

An “allele” refers to the nucleotide sequence of a given locus(position) on a chromosome. A polymorphic marker allele thus refers tothe composition (i.e., sequence) of the marker on a chromosome. GenomicDNA from an individual contains two alleles for any given polymorphicmarker, representative of each copy of the marker on each chromosome.Sequence codes for nucleotides used herein are: A=1, C=2, G=3, T=4. Formicrosatellite alleles, the CEPH sample (Centre d′Etudes duPolymorphisme Humain, genomics repository, CEPH sample 1347-02) is usedas a reference, the shorter allele of each microsatellite in this sampleis set as 0 and all other alleles in other samples are numbered inrelation to this reference. Thus, e.g., allele 1 is 1 bp longer than theshorter allele in the CEPH sample, allele 2 is 2 bp longer than theshorter allele in the CEPH sample, allele 3 is 3 bp longer than thelower allele in the CEPH sample, etc., and allele −1 is 1 bp shorterthan the shorter allele in the CEPH sample, allele −2 is 2 bp shorterthan the shorter allele in the CEPH sample, etc.

A “Single Nucleotide Polymorphism” or “SNP” is a DNA sequence variationoccurring when a single nucleotide at a specific location in the genomediffers between members of a species or between paired chromosomes in anindividual. Most SNP polymorphisms have two alleles. Each individual isin this instance either homozygous for one allele of the polymorphism(i.e. both chromosomal copies of the individual have the same nucleotideat the SNP location), or the individual is heterozygous (i.e. the twosister chromosomes of the individual contain different nucleotides). TheSNP nomenclature as reported herein refers to the official Reference SNP(rs) ID identification tag as assigned to each unique SNP by theNational Center for Biotechnological Information (NCBI).

Sequence conucleotide ambiguity as described herein is as proposed byIUPAC-IUB. These codes are compatible with the codes used by the EMBL,GenBank, and PIR databases.

IUB code Meaning A Adenosine C Cytidine G Guanine T Thymidine R G or A YT or C K G or T M A or C S G or C W A or T B C G or T D A G or T H A Cor T V A C or G N A C G or T (Any base)

A nucleotide position at which more than one sequence is possible in apopulation (either a natural population or a synthetic population, e.g.,a library of synthetic molecules) is referred to herein as a“polymorphic site”.

A “variant”, as described herein, refers to a segment of DNA thatdiffers from the reference DNA. A “marker” or a “polymorphic marker”, asdefined herein, is a variant. Alleles that differ from the reference arereferred to as “variant” alleles.

A “microsatellite” is a polymorphic marker that has multiple smallrepeats of bases that are 2-8 nucleotides in length (such as CA repeats)at a particular site, in which the number of repeat lengths varies inthe general population.

An “indel” is a common form of polymorphism comprising a small insertionor deletion that is typically only a few nucleotides long.

A “haplotype,” as described herein, refers to a segment of genomic DNAwithin one strand of DNA that is characterized by a specific combinationof alleles arranged along the segment. For diploid organisms such ashumans, a haplotype comprises one member of the pair of alleles for eachpolymorphic marker or locus. In a certain embodiment, the haplotype cancomprise two or more alleles, three or more alleles, four or morealleles, or five or more alleles.

The term “susceptibility”, as described herein, encompasses bothincreased susceptibility and decreased susceptibility. Thus, particularpolymorphic markers and/or haplotypes of the invention may becharacteristic of increased susceptibility (i.e., increased risk) ofbreast cancer, as characterized by a relative risk (RR) of greater thanone, or as an odds ratio (OR) of greater than one. Alternatively, themarkers and/or haplotypes of the invention are characteristic ofdecreased susceptibility (i.e., decreased risk) of breast cancer, ascharacterized by a relative risk of less than one, or an odds ratio ofless than one. Haplotypes are described herein in the context of themarker name and the allele of the marker in that haplotype, e.g., “Trs4415084” refers to the T allele of marker rs4415084 being in thehaplotype, and this nomenclature is equivalent to “rs4415084 allele T”and “T-rs4415084”. Furthermore, allelic codes in haplotypes are as forindividual markers, i.e. 1=A, 2=C, 3=G and 4=T.

The term “susceptibility”, as described herein, refers to the pronenessof an individual towards the development of a certain state (e.g., acertain trait, phenotype or disease, e.g., breast cancer), or towardsbeing less able to resist a particular state than the averageindividual. The term encompasses both increased susceptibility anddecreased susceptibility. Thus, particular alleles at polymorphicmarkers and/or haplotypes of the invention as described herein may becharacteristic of increased susceptibility (i.e., increased risk) ofbreast cancer, as characterized by a relative risk (RR) or odds ratio(OR) of greater than one for the particular allele or haplotype.Alternatively, the markers and/or haplotypes of the invention arecharacteristic of decreased susceptibility (i.e., decreased risk) ofbreast cancer, as characterized by a relative risk of less than one.

The term “and/or” shall in the present context be understood to indicatethat either or both of the items connected by it are involved. In otherwords, the term herein shall be taken to mean “one or the other orboth”.

The term “look-up table”, as described herein, is a table thatcorrelates one form of data to another form, or one or more forms ofdata to a predicted outcome to which the data is relevant, such asphenotype or trait. For example, a look-up table can comprise acorrelation between allelic data for at least one polymorphic marker anda particular trait or phenotype, such as a particular disease diagnosis,that an individual who comprises the particular allelic data is likelyto display, or is more likely to display than individuals who do notcomprise the particular allelic data. Look-up tables can bemultidimensional, i.e. they can contain information about multiplealleles for single markers simultaneously, or the can containinformation about multiple markers, and they may also comprise otherfactors, such as particulars about diseases diagnoses, racialinformation, biomarkers, biochemical measurements, therapeutic methodsor drugs, etc.

A “computer-readable medium”, is an information storage medium that canbe accessed by a computer using a commercially available or custom-madeinterface. Exemplary compute-readable media include memory (e.g., RAM,ROM, flash memory, etc.), optical storage media (e.g., CD-ROM), magneticstorage media (e.g., computer hard drives, floppy disks, etc.), punchcards, or other commercially available media. Information may betransferred between a system of interest and a medium, betweencomputers, or between computers and the computer-readable medium forstorage or acess of stored information. Such transmission can beelectrical, or by other available methods, such as IR links, wirelessconnections, etc.

A “nucleic acid sample” is a sample obtained from an individual thatcontains nucleic acid (DNA or RNA). In certain embodiments, i.e. thedetection of specific polymorphic markers and/or haplotypes, the nucleicacid sample comprises genomic DNA. Such a nucleic acid sample can beobtained from any source that contains genomic DNA, including as a bloodsample, sample of amniotic fluid, sample of cerebrospinal fluid, ortissue sample from skin, muscle, buccal or conjunctival mucosa,placenta, gastrointestinal tract or other organs.

The term “breast cancer therapeutic agent” refers to an agent that canbe used to ameliorate or prevent symptoms associated with breast cancer.

The term “breast cancer-associated nucleic acid”, as described herein,refers to a nucleic acid that has been found to be associated to breastcancer. This includes, but is not limited to, the markers and haplotypesdescribed herein and markers and haplotypes in strong linkagedisequilibrium (LD) therewith.

The term “Breast Cancer”, as described herein, refers to any clinicaldiagnosis of breast cancer, and includes any and all particularsubphenotypes of breast cancer. For example, breast cancer is sometimescategorized as estrogen receptor (ER) positive breast or estrogenreceptor negative breast cancer; breast cancer is sometimes alsocategorized as progesterone receptor (PR) positive or negative. Breastcancer is furthermore sometimes diagnosed as invasive ductal, asinvasive lobular, as tubular, or as otherwise invasive or mixedinvasive. Breast cancer can also be categorized as medullary DCIS(Ductal Carcinoma In-Situ) or LCIS (Lobular Carcinoma In-Situ, orotherwise non-invasive. Invasive breast cancer can also be defined asstage 0, stage 1, stage 2 (including stage 2a and stage 2b), stage 3(including stage 3a, stage 3b and stage 3c) or stage 4 breast cancer. Inthe present context, “breast cancer” can include any of thesesubphenotypes of breast cancer, and also includes any other clinicallyapplicable subphenotypes of breast cancer.

The term “All Breast Cancer”, or “All BC”, refers to all individualsdiagnosed with breast cancer.

The term “Medium Predisposition” breast cancer or “MedPre” breastcancer, refers to a sub-phenotype of breast cancer. The definition ofthis phenotype requires that the proband fulfills at least one of thefollowing criteria:

-   -   The proband is a member of a cluster of breast cancer cases        containing 3 or more affected relatives within a genetic        distance of 3 meiotic events (3M).    -   The proband is a member of an affected pair related within 3M,        one of whom was diagnosed when aged 50 or younger.    -   The proband is a member of an affected pair related within 3M,        one of whom was diagnosed with a second primary tumor of any        type.    -   The proband has been diagnosed with a second primary tumor of        any type.

The term “Multiple Primary Breast Tumor”, or “MPBC”, as describedherein, refers to cases where at least one Primary tumor is diagnosed inaddition to the first breast cancer diagnosis, and the two tumorsconfirmed both clinically and by histology to be independent primarytumors, arising simultaneously or subsequently to the first breastcancer and occurring in the contralateral or ipsilateral breast.

The term “family history score” or “FHS”, as described herein, isdefined based on the number of relatives affected with breast cancer fora proband with the disease. For each proband, a score of 1 is assignedfor each affected first-degree relative, 0.5 for each affected seconddegree relative, and 0.25 for each third-degree relative. The total sumthus obtained over all affected relatives represents the summed familyhistory score or FHS.

The term “estrogen receptor positive breast cancer”, or “ER-positivebreast cancer”, as described herein, refers to tumors determined to bepositive for estrogen receptor. In the present context, ER levels ofgreater than or equal to 10 fmol/mg and/or an immunohistochemicalobservation of greater than or equal to 10% positive nuclei isconsidered to be ER positive. Breast cancer that does not fulfill thecriteria of being ER positive is defined herein as “ER negative” or“estrogen receptor negative”.

The term “progesterone receptor positive breast cancer”, or “PR-positivebreast cancer”, as described herein, refers to tumors determined to bepositive for progesterone receptor. In the present context, PR levels ofgreater than or equal to 10 fmol/mg and/or an immunohistochemicalobservation of greater than or equal to 10% positive nuclei isconsidered to be PR positive. Breast cancer that does not fulfill thecriteria of being PR positive is defined herein as “PR negative” or“progesterone receptor negative”.

The term “chromosome 5p12”, as described herein, refers to the region onChromosome 5 between positions 44,094,392 and 46,393,984 of NCBI(National Center for Biotechnology Information) Build 34.

The term “FGF10” or “FGF10 gene”, as described herein, refers to theFibroblast Growth Factor 10 gene on human chromosome 5p.

The term “MRPS30” or “MRPS30 gene”, as described herein, refers to theMitochondrial Ribosomal Protein 530 gene on human chromosome 5p. Thisgene is also called programmed cell death protein 9 (PDCD9), and encodesa mitochondrial S28 subunit.

The term “FGFR2” or “FGFR2 gene”, as described herein, refers to theFibroblast Growth Factor Receptor 2 gene on human chromosome 10q26. Thisgene is also called Protein Tyrosine Kinase Receptor Like 14 (TK14),Keratinocyte Growth Factor Receptor (KGFR), and Fibroblast Growth FactorReceptor BEK.

Through association analysis of a population of individuals diagnosedwith breast cancer according to the present invention, it has beendiscovered that certain alleles at certain polymorphic markers onchromosome 5p12 are associated with breast cancer. A genome-wideanalysis for variants associated with cancer revealed association ofbreast cancer to a region of chromosome 5, between positions 44,094,392and 46,393,984 (NCBI Build 34 coordinates), referred to herein aschromosome 5p12 region. Particular markers were found to be associatedwith an increased risk of breast cancer in this region.

Through genotyping of approximately 1,600 Icelandic breast cancerpatients and 11,563 controls using the Illumina HumanHap300 microarraytechnology, a large number of markers on chromosome 5p were found toshow association to breast cancer (Table 1). In particular, the T alleleof marker rs4415084 and the G allele of marker rs7703618 were found tobe associated with an increased risk of breast cancer. The associationof marker rs7703618 was replicated in a second Icelandic cohort, showingthat the association signal is indeed significant.

A comparison of the Iceland discovery cohort with the public CGEMS dataset revealed that association to rs4415084 is also replicated in thiscohort. In fact, the association signal to this marker (p-value 9.02E-06in the Icelandic discovery cohort) is significant at the genome-widelevel (after Bonferroni correction), with a nominal p-value of 1.38E-07when the two data sets are merged. This SNP had an unremarkable P-valueof 2.21E-03 in the CGEMS data set alone, but does replicate the originalfinding in the Icelandic population.

Marker rs10941679, which is correlated with marker rs4415084 (D′=0.99,r2=0.51), has an even stronger correlation with breast cancer (OR=1.19,p-value 2.2E-06). Follow-up analysis has shown that the signal due tors4415084 and rs10941679 in cohorts from Sweden, Holland, Spain and theUS (see Table 6).

The present invention also shows evidence of allelic heterogeneity inthe Chr5p12 region, and six equivalence classes, represented by the keymarkers rs7703618, rs4415084, rs2067980, rs10035564, rs11743392 andrs7716600, have been identified. Further analysis has established thatthe observed association signal is mostly accounted for by markersrs4415084 and rs10941679.

There are three known genes of note in the region identified by thepresent invention as harboring markers and haplotypes associating withbreast cancer. These genes are FGF10, MRPS30, and HCN1, along with thepoorly characterized gene LOC441070. Two of these genes, FGF10 andMRPS30, are compelling candidates for an involvement in breast cancerpredisposition.

Thus, FGF10 is required for normal embryonic development of the breast[Howard and Ashworth, (2006), PLoS Genet, 2, e112], and FGF10 has beenimplicated as an oncogene in mouse models of breast cancer by MMTVinsertional mutagenesis and FGF10 is over expressed in around 10% ofhuman breast cancers [Theodorou, et al., (2004), Oncogene, 23, 6047-55].The FGF10 gene is separated from the main clusters of associationsignals by a recombination hotspot. However key elements controllingregulation of FGF10 may be present in the region where the strongassociation signals occur. Alternatively, the association signals may bein linkage disequilibrium with pathogenic mutations within the FGF10gene itself.

The MRPS30 gene, also known as programmed cell death protein 9 (PDCD9),encodes a mitochondrial 28S ribosomal subunit. This gene is themammalian counterpart of the Gallus gallus pro-apoptotic protein p52. Ithas been shown to induce apoptosls and activate the stress-responsiveJNK1 pathway in mammalian cells. The protein appears to function inapoptosis at least in part through the Bcl-2 pathway [Sun, et al.,(1998), Gene, 208, 157-66; Carim, et al., (1999), Cytogenet Cell Genet,87, 85-8; Cavdar Koc, et al., (2001), FEBS Lett, 492, 166-70]. Althoughit has not been implicated previously in breast cancer, its involvementin the above pathways suggest that genetic variants in MRPS30 may beinvolved in modifying breast cancer risk.

It has also been discovered that marker rs1219648 at the FGFR2 locus onchromosome 10 confers risk of breast cancer (Table 6), which isparticularly associated with ER positive tumours (Table 10). It was alsodiscovered that association to rs1219648 was more significant in nodepositive than node negative tumours, and that the association isstronger for individuals with a family history of breast cancer.

Assessment for Markers and Haplotypes

The genomic sequence within populations is not identical whenindividuals are compared. Rather, the genome exhibits sequencevariability between individuals at many locations in the genome. Suchvariations in sequence are commonly referred to as polymorphisms, andthere are many such sites within each genome. For example, the humangenome exhibits sequence variations which occur on average every 500base pairs. The most common sequence variant consists of base variationsat a single base position in the genome, and such sequence variants, orpolymorphisms, are commonly called Single Nucelotide Polymorphisms(“SNPs”). These SNPs are believed to have arisen by a single mutationalevent, and therefore there are usually two possible alleles possible ateach SNPsite; the original allele and the mutated (alternate) allele.Due to natural genetic drift and possibly also selective pressure, theoriginal mutation has resulted in a polymorphism characterized by aparticular frequency of its alleles in any given population. Many othertypes of sequence variants are found in the human genome, includingmini- and microsatellites, and insertions, deletions, inversions (alsocalled copy number variations (CNVs)). A polymorphic microsatellite hasmultiple small repeats of bases (such as CA repeats, TG on thecomplimentary strand) at a particular site in which the number of repeatlengths varies in the general population. In general terms, each versionof the sequence with respect to the polymorphic site represents aspecific allele of the polymorphic site. All sequence variants can bereferred to as polymorphisms, occurring at specific polymorphic sitescharacteristic of the sequence variant in question. In general terms,polymorphisms can comprise any number of specific alleles. Thus in oneembodiment of the invention, the polymorphism is characterized by thepresence of two or more alleles in any given population. In anotherembodiment, the polymorphism is characterized by the presence of threeor more alleles. In other embodiments, the polymorphism is characterizedby four or more alleles, five or more alleles, six or more alleles,seven or more alleles, nine or more alleles, or ten or more alleles. Allsuch polymorphisms can be utilized in the methods and kits of thepresent invention, and are thus within the scope of the invention.

Due to their abundance, SNPs account for a majority of sequencevariation in the human genome. Over 6 million SNPs have been validatedto date (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi).However, CNVs are receiving increased attention. These large-scalepolymorphisms (typically 1 kb or larger) account for polymorphicvariation affecting a substantial proportion of the assembled humangenome; known CNVs covery over 15% of the human genome sequence(Estivill, X Armengol; L., PloS Genetics 3:1787-99 (2007). Ahttp://projects.tcag.ca/variation/). Most of these polymorphisms arehowever very rare, and on average affect only a fraction of the genomicsequence of each individual. CNVs are known to affect gene expression,phenotypic variation and adaptation by disrupting gene dosage, and arealso known to cause disease (microdeletion and microduplicationdisorders) and confer risk of common complex diseases, including HIV-1infection and glomerulonephritis (Redon, R., et al. Nature 23:444-454(2006)). It is thus possible that either previously described or unknownCNVs represent causative variants in linkage disequilibrium with themarkers described herein to be associated with breast cancer. Methodsfor detecting CNVs include comparative genomic hybridization (CGH) andgenotyping, including use of genotyping arrays, as described by Carter(Nature Genetics 39:516-S21 (2007)). The Database of Genomic Variants(http://projects.tcag.ca/variation/) contains updated information aboutthe location, type and size of described CNVs. The database currentlycontains data for over 15,000 CNVs.

In some instances, reference is made to different alleles at apolymorphic site without choosing a reference allele. Alternatively, areference sequence can be referred to for a particular polymorphic site.The reference allele is sometimes referred to as the “wild-type” alleleand it usually is chosen as either the first sequenced allele or as theallele from a “non-affected” individual (e.g., an individual that doesnot display a trait or disease phenotype).

Alleles for SNP markers as referred to herein refer to the bases A, C, Gor T as they occur at the polymorphic site in the SNP assay employed.The allele codes for SNPs used herein are as follows: 1=A, 2=C, 3=G,4=T. The person skilled in the art will however realize that by assayingor reading the opposite DNA strand, the complementary allele can in eachcase be measured. Thus, for a polymorphic site (polymorphic marker)characterized by an A/G polymorphism, the assay employed may be designedto specifically detect the presence of one or both of the two basespossible, i.e. A and G. Alternatively, by designing an assay that isdesigned to detect the opposite strand on the DNA template, the presenceof the complementary bases T and C can be measured. Quantitatively (forexample, in terms of relative risk), identical results would be obtainedfrom measurement of either DNA strand (+ strand or − strand).

Typically, a reference sequence is referred to for a particularsequence. Alleles that differ from the reference are sometimes referredto as “variant” alleles. A variant sequence, as used herein, refers to asequence that differs from the reference sequence but is otherwisesubstantially similar. Alleles at the polymorphic genetic markersdescribed herein are variants. Variants can include changes that affecta polypeptide. Sequence differences, when compared to a referencenucleotide sequence, can include the insertion or deletion of a singlenucleotide, or of more than one nucleotide, resulting in a frame shift;the change of at least one nucleotide, resulting in a change in theencoded amino acid; the change of at least one nucleotide, resulting inthe generation of a premature stop codon; the deletion of severalnucleotides, resulting in a deletion of one or more amino acids encodedby the nucleotides; the insertion of one or several nucleotides, such asby unequal recombination or gene conversion, resulting in aninterruption of the coding sequence of a reading frame; duplication ofall or a part of a sequence; transposition; or a rearrangement of anucleotide sequence. Such sequence changes can alter the polypeptideencoded by the nucleic acid. For example, if the change in the nucleicacid sequence causes a frame shift, the frame shift can result in achange in the encoded amino acids, and/or can result in the generationof a premature stop codon, causing generation of a truncatedpolypeptide. Alternatively, a polymorphism associated with a disease ortrait can be a synonymous change in one or more nucleotides (i.e., achange that does not result in a change in the amino acid sequence).Such a polymorphism can, for example, alter splice sites, affect thestability or transport of mRNA, or otherwise affect the transcription ortranslation of an encoded polypeptide. It can also alter DNA to increasethe possibility that structural changes, such as amplifications ordeletions, occur at the somatic level. The polypeptide encoded by thereference nucleotide sequence is the “reference” polypeptide with aparticular reference amino acid sequence, and polypeptides encoded byvariant alleles are referred to as “variant” polypeptides with variantamino acid sequences.

A haplotype refers to a segment of DNA that is characterized by aspecific combination of alleles arranged along the segment. For diploidorganisms such as humans, a haplotype comprises one member of the pairof alleles for each polymorphic marker or locus. In a certainembodiment, the haplotype can comprise two or more alleles, three ormore alleles, four or more alleles, or five or more alleles, each allelecorresponding to a specific polymorphic marker along the segment.Haplotypes can comprise a combination of various polymorphic markers,e.g., SNPs and microsatellites, having particular alleles at thepolymorphic sites. The haplotypes thus comprise a combination of allelesat various genetic markers.

Detecting specific polymorphic markers and/or haplotypes can beaccomplished by methods known in the art for detecting sequences atpolymorphic sites. For example, standard techniques for genotyping forthe presence of SNPs and/or microsatellite markers can be used, such asfluorescence-based techniques (Chen, X. et al., Genome Res. 9(5): 492-98(1999)), utilizing PCR, LCR, Nested PCR and other techniques for nucleicacid amplification. Specific commercial methodologies available for SNPgenotyping include, but are not limited to, TaqMan genotyping assays andSNPIex platforms (Applied Biosystems), gel electrophoresis (AppliedBiosystems), mass spectrometry (e.g., MassARRAY system from Sequenom),minisequencing methods, real-time PCR, Bio-Plex system (BioRad), CEQ andSNPstream systems (Beckman), array hybridization technology (e.g.,Affymetrix GeneChip; Perlegen), BeadArray Technologies (e.g., IlluminaGoldenGate and Infinium assays), array tag technology (e.g., Parallele),and endonuclease-based fluorescence hybridization technology (Invader;Third Wave). Some of the available array platforms, including AffymetrixSNP Array 6:0 and Illumina CNV370-Duo and 1M BeadChips, include SNPsthat tag certain CNVs. This allows detection of CNVs via surrogate SNPsincluded in these platforms. Thus, by use of these or other methodsavailable to the person skilled in the art, one or more alleles atpolymorphic markers, including microsatellites, SNPs or other types ofpolymorphic markers, can be identified.

In certain methods described herein, an individual who is at anincreased susceptibility (i.e., increased risk) for breast cancer, is anindividual in whom at least one specific allele at one or morepolymorphic marker or haplotype conferring increased susceptibility forbreast cancer is identified (i.e., at-risk marker alleles orhaplotypes). In one aspect, the at-risk marker or haplotype is one thatconfers a significant increased risk (or susceptibility) of breastcancer. In one embodiment, significance associated with a marker orhaplotype is measured by a relative risk (RR). In another embodiment,significance associated with a marker or haplotye is measured by an oddsratio (OR). In a further embodiment, the significance is measured by apercentage. In one embodiment, a significant increased risk is measuredas a risk (relative risk and/or odds ratio) of at least 1.10, includingbut not limited to: at least 1.11, at least 1.12, at least 1.13, atleast 1.14, at least 1.15, at least 1.16, at least 1.17, at least 1.18,at least 1.19, at least 1.20, at least 1.21, at least 1.22, at least1.23, at least 1.24, at least 1.25, at least 1.30, at least 1.35, atleast 1.40, at least 1.50, at least 1.60, at least 1.70, 1.80, at least1.90, at least 2.0, at least 2.5, at least 3.0, at least 4.0, and atleast 5.0. In a particular embodiment, a risk (relative risk and/or oddsratio) of at least 1.15 is significant. In another particularembodiment, a risk of at least 1.17 is significant. In yet anotherembodiment, a risk of at least 1.20 is significant. In a furtherembodiment, a relative risk of at least about 1.25 is significant. Inanother further embodiment, a significant increase in risk is at leastabout 1.30 is significant. However, other cutoffs are also contemplated,e.g. at least 1.16, 1.18, 1.19, 1.21, 1.22, and so on, and such cutoffsare also within scope of the present invention. In other embodiments, asignificant increase in risk is at least about 10%, including but notlimited to about 15%, about 20%, about 25%, about 30%, about 35%, about40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%,about 75%, about 80%, about 85%, about 90%, about 95%, and about 100%.In one particular embodiment, a significant increase in risk is at least15%. In other embodiments, a significant increase in risk is at least17%, at least 20%, at least 22%, at least 24%, at least 25%, at least30%, at least 32% and at least 35%. Other cutoffs or ranges as deemedsuitable by the person skilled in the art to characterize the inventionare however also contemplated, and those are also within scope of thepresent invention. In certain embodiments, a significant increase inrisk is characterized by a p-value, such as a p-value of less than 0.05,less than 0.01, less than 1×10⁻³ (0.001), less than 1×10⁻⁴ (0.0001),less than 1×10⁻⁴ (0.00001), less than 1×10⁻⁵ (0.000001), less than1×10⁻⁶ (0.0000001), less than 1×10⁻⁷ (0.00000001), or less than 1×10⁻⁸(0.000000001).

An at-risk polymorphic marker or haplotype of the present invention isone where at least one allele of at least one marker or haplotype ismore frequently present in an individual at risk for the disease ortrait (affected), or diagnosed with the disease or trait, compared tothe frequency of its presence in a comparison group (control), such thatthe presence of the marker or haplotype is indicative of susceptibilityto the disease or trait (e.g., breast cancer). The control group may inone embodiment be a population sample, i.e. a random sample from thegeneral population. In another embodiment, the control group isrepresented by a group of individuals who are disease-free, i.e.individuals who have not been diagnosed with breast cancer. Suchdisease-free control may in one embodiment be characterized by theabsence of one or more specific disesase-associated symptoms. In anotherembodiment, the disease-free control group is characterized by theabsence of one or more disease-specific risk factors. Such risk factorsare in one embodiment at least one environmental risk factor.Representative environmental factors are natural products, minerals orother chemicals which are known to affect, or contemplated to affect,the risk of developing the specific disease or trait. Otherenvironmental risk factors are risk factors related to lifestyle,including but not limited to food and drink habits, geographicallocation of main habitat, and occupational risk factors. In anotherembodiment, the risk factors are at least one genetic risk factor.

As an example of a simple test for correlation would be a Fisher-exacttest on a two by two table. Given a cohort of chromosomes, the two bytwo table is constructed out of the number of chromosomes that includeboth of the markers or haplotypes, one of the markers or haplotypes butnot the other and neither of the markers or haplotypes. Otherstatistical tests of association known to the skilled person are alsocontemplated and are also within scope of the invention.

In other embodiments of the invention, an individual who is at adecreased susceptibility (i.e., at a decreased risk) for a disease ortrait is an individual in whom at least one specific allele at one ormore polymorphic marker or haplotype conferring decreased susceptibilityfor the disease or trait is identified. The marker alleles and/orhaplotypes conferring decreased risk are also said to be protective. Inone aspect, the protective marker or haplotype is one that confers asignificant decreased risk (or susceptibility) of the disease or trait.In one embodiment, significant decreased risk is measured as a relativerisk of less than 0.90, including but not limited to less than 0.85,less than 0.80, less than 0.75, less than 0.7, less than 0.6, less than0.5, less than 0.4, less than 0.3, less than 0.2 and less than 0.1. Inone particular embodiment, significant decreased risk is less than 0.90.In another embodiment, significant decreased risk is less than 0.85. Inyet another embodiment, significant decreased risk is less than 0.80. Inanother embodiment, the decrease in risk (or susceptibility) is at least10%, including but not limited to at least 15%, at least 20%, at least25%, at least 30%, at least 35%, at least 40%, at least 45%, at least50%, at least 55%, at least 60%, at least 65%, at least 70%, at least80%, at least 85%, at least 90%, at least 95% and at least 98%. In oneparticular embodiment, a significant decrease in risk is at least about15%. In another embodiment, a significant decrease in risk at leastabout 20%. In another embodiment, the decrease in risk is at least about25%. Other cutoffs or ranges as deemed suitable by the person skilled inthe art to characterize the invention are however also contemplated, andthose are also within scope of the present invention.

The person skilled in the art will appreciate that for polymorphicmarkers with two alleles present in the population being studied (suchas SNPs), and wherein one allele is found in increased frequency in agroup of individuals with a trait or disease in the population, comparedwith controls, the other allele of the marker will be found in decreasedfrequency in the group of individuals with the trait or disease,compared with controls. In such a case, one allele of the marker (theone found in increased frequency in individuals with the trait ordisease) will be the at-risk allele, while the other allele will be aprotective allele.

A genetic variant associated with a disease or a trait (e.g. breastcancer) can be used alone to predict the risk of the disease for a givengenotype. For a biallelic marker, such as a SNP, there are 3 possiblegenotypes: homozygote for the at risk variant, heterozygote, and noncarrier of the at risk variant. Risk associated with variants atmultiple loci can be used to estimate overall risk. For multiple SNPvariants, there are k possible genotypes k=3^(n)×2^(p); where n is thenumber autosomal loci and p the number of gonosomal (sex chromosomal)loci. Overall risk assessment calculations usually assume that therelative risks of different genetic variants multiply, i.e. the overallrisk (e.g., RR or OR) associated with a particular genotype combinationis the product of the risk values for the genotype at each locus. If therisk presented is the relative risk for a person, or a specific genotypefor a person, compared to a reference population with matched gender andethnicity, then the combined risk—is the product of the locus specificrisk values—and which also corresponds to an overall risk estimatecompared with the population. If the risk for a person is based on acomparison to non-carriers of the at risk allele, then the combined riskcorresponds to an estimate that compares the person with a givencombination of genotypes at all loci to a group of individuals who donot carry risk variants at any of those loci. The group of non-carriersof any at risk variant has the lowest estimated risk and has a combinedrisk, compared with itself (i.e., non-carriers) of 1.0, but has anoverall risk, compare with the population, of less than 1.0. It shouldbe noted that the group of non-carriers can potentially be very small,especially for large number of loci, and in that case, its relevance iscorrespondingly small.

The multiplicative model is a parsimonious model that usually fits thedata of complex traits reasonably well. Deviations from multiplicityhave been rarely described In the context of common variants for commondiseases, and if reported are usually only suggestive since very largesample sizes are usually required to be able to demonstrate statisticalinteractions between loci.

By way of an example, let us consider a total of eight variants thathave been described to associate with prostate cancer (Gudmundsson, J.,et al., Nat Genet 39:631-7 (2007), Gudmundsson, J., et al., Nat Genet39:977-83 (2007); Yeager, M., et al, Nat Genet 39:645-49 (2007),Amundadottir, L., el al., Nat Genet 38:652-8 (2006); Haiman, C. A., etal., Nat Genet 39:638-44 (2007)). Seven of these loci are on autosomes,and the remaining locus is on chromosome X. The total number oftheoretical genotypic combinations is then 3⁷×2¹=4374. Some of thosegenotypic classes are very rare, but are still possible, and should beconsidered for overall risk assessment. It is likely that themultiplicative model applied in the case of multiple genetic variantwill also be valid in conjugation with non-genetic risk variantsassuming that the genetic variant does not clearly correlate with the“environmental” factor. In other words, genetic and non-genetic at-riskvariants can be assessed under the multiplicative model to estimatecombined risk, assuming that the non-genetic and genetic risk factors donot interact.

Using the same quantitative approach, the combined or overall riskassociated with a plurality of variants associated with breast cancermay be assessed.

Linkage Disequilibrium

The natural phenomenon of recombination, which occurs on average oncefor each chromosomal pair during each meiotic event, represents one wayin which nature provides variations in sequence (and biological functionby consequence). It has been discovered that recombination does notoccur randombly in the genome; rather, there are large variations in thefrequency of recombination rates, resulting in small regions of highrecombination frequency (also called recombination hotspots) and largerregions of low recombination frequency, which are commonly referred toas Linkage Disequilibrium (LD) blocks (Myers, S. et al., Biochem SocTrans 34:526-530 (2006); Jeffreys, A. J., et al., Nature Genet29:217-222 (2001); May, C. A., et al., Nature Genet 31:272-275(2002)).

Linkage Disequilibrium (LD) refers to a non-random assortment of twogenetic elements. For example, if a particular genetic element (e.g., anallele of a polymorphic marker, or a haplotype) occurs in a populationat a frequency of 0.50 (50%) and another element occurs at a frequencyof 0.50 (50%), then the predicted occurrance of a person's having bothelements is 0.25 (25%), assuming a random distribution of the elements.However, if it is discovered that the two elements occur together at afrequency higher than 0.125, then the elements are said to be in linkagedisequilibrium, since they tend to be inherited together at a higherrate than what their independent frequencies of occurrence (e.g., alleleor haplotype frequencies) would predict. Roughly speaking, LD isgenerally correlated with the frequency of recombination events betweenthe two elements. Allele or haplotype frequencies can be determined in apopulation by genotyping individuals in a population and determining thefrequency of the occurence of each allele or haplotype in thepopulation. For populations of diploids, e.g., human populations,individuals will typically have two alleles for each genetic element(e.g., a marker, haplotype or gene).

Many different measures have been proposed for assessing the strength oflinkage disequilibrium (LD; reviewed in Devlin, B. & Risch, N., Genomics29:311-22 (1995)). Most capture the strength of association betweenpairs of biallelic sites. Two important pairwise measures of LD are r²(sometimes denoted □Δ²) and |D′| (Lewontin, R., Genetics 49:49-67(1964); Hill, W. G. & Robertson, A. Theor. Appl. Genet. 22:226-231(1968)). Both measures range from 0 (no disequilibrium) to 1 (‘complete’disequilibrium), but their interpretation is slightly different. |D′| isdefined in such a way that it is equal to 1 if just two or three of thepossible haplotypes are present, and it is <1 if all four possiblehaplotypes are present. Therefore, a value of |D′| that is <1 indicatesthat historical recombination may have occurred between two sites(recurrent mutation can also cause |D′| to be <1, but for singlenucleotide polymorphisms (SNPs) this is usually regarded as being lesslikely than recombination). The measure r² represents the statisticalcorrelation between two sites, and takes the value of 1 if only twohaplotypes are present.

The r² measure is arguably the most relevant measure for associationmapping, because there is a simple inverse relationship between r² andthe sample size required to detect association between susceptibilityloci and particular SNPs. These measures are defined for pairs of sites,but for some applications a determination of how strong LD is across anentire region that contains many polymorphic sites might be desirable(e.g., testing whether the strength of LD differs significantly amongloci or across populations, or whether there is more or less LD in aregion than predicted under a particular model). Measuring LD across aregion is not straightforward, but one approach is to use the measure r,which was developed in population genetics. Roughly speaking, r measureshow much recombination would be required under a particular populationmodel to generate the LD that is seen in the data. This type of methodcan potentially also provide a statistically rigorous approach to theproblem of determining whether LD data provide evidence for the presenceof recombination hotspots. For the methods described herein, asignificant r² value between genetic segments (such as SNP markers) canbe at least 0.1 such as at least 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4,0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.91, 0.92, 0.93,0.94, 0.95, 0.96, 0.97, 0.98, 0.99 or 1.0. In one preferred embodiment,the significant r² value can be at least 0.2. Alternatively, linkagedisequilibrium as described herein, refers to linkage disequilibriumcharacterized by values of |D′| of at least 0.2, such as 0.3, 0.4, 0.5,0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.96, 0.97, 0.98, 0.99.

Thus, linkage disequilibrium represents a correlation between alleles ofdistinct markers. It is measured by correlation coefficient or |D′| (r²up to 1.0 and |D′| up to 1.0). Linkage disequilibrium can be determinedin a single human population, as defined herein, or it can be determinedin a collection of samples comprising individuals from more than onehuman population. In one embodiment of the invention, LD is determinedin a sample from one or more of the HapMap populations (caucasian,african, japanese, chinese), as defined (http://www.hapmap.org). In onesuch embodiment, LD is determined in the CEU population of the HapMapsamples. In another embodiment, LD is determined in the YRI population.In another embodiment, LD is determined in a European population. In yetanother embodiment, LD is determined in the Icelandic population.

If all polymorphisms in the genome were identical at the populationlevel, then every single one of them would need to be investigated inassociation studies. However, due to linkage disequilibrium betweenpolymorphisms, tightly linked polymorphisms are strongly correlated,which reduces the number of polymorphisms that need to be investigatedin an association study to observe a significant association. Anotherconsequence of LD is that many polymorphisms may give an associationsignal due to the fact that these polymorphisms are strongly correlated.

Genomic LD maps have been generated across the genome, and such LD mapshave been proposed to serve as framework for mapping disease-genes(Risch, N. & Merkiangas, K, Science 273:1516-1517 (1996); Maniatis, N.,et al., Proc Nati Acad Sci USA 99:2228-2233 (2002); Reich, D E et al,Nature 411:199-204 (2001)).

It is now established that many portions of the human genome can bebroken into series of discrete haplotype blocks containing a few commonhaplotypes; for these blocks, linkage disequilibrium data provideslittle evidence indicating recombination (see, e.g., Wall., J. D. andPritchard, J. K., Nature Reviews Genetics 4:587-597 (2003); Daly, M. etal., Nature Genet. 29:229-232 (2001); Gabriel, S. B. et al., Science296:2225-2229 (2002); Patil, N. et al., Science 294:1719-1723 (2001);Dawson, E. et al., Nature 418:544-548 (2002); Phillips, M. S. et al.,Nature Genet. 33:382-387 (2003)).

There are two main methods for defining these haplotype blocks: blockscan be defined as regions of DNA that have limited haplotype diversity(see, e.g., Daly, M. et al., Nature Genet. 29:229-232 (2001); Patil, N.et al., Science 294:1719-1723 (2001); Dawson, E. et al., Nature418:544-548 (2002); Zhang, K. et al., Proc. Natl. Acad. Sci. USA99:7335-7339 (2002)), or as regions between transition zones havingextensive historical recombination, identified using linkagedisequilibrium (see, e.g., Gabriel, S. B. et al., Science 296:2225-2229(2002); Phillips, M. S. et al., Nature Genet. 33:382-387 (2003); Wang,N. et al., Am. J. Hum. Genet. 71:1227-1234 (2002); Stumpf, M. P., andGoldstein, D. B., Curr. Biol. 13:1-8 (2003)). More recently, afine-scale map of recombination rates and corresponding hotspots acrossthe human genome has been generated (Myers, S., et al., Science310:321-32324 (2005); Myers, S. et al., Biochem Soc Trans 34:526530(2006)). The map reveals the enormous variation in recombination acrossthe genome, with recombination rates as high as 10-60 cM/Mb in hotspots,while closer to 0 in intervening regions, which thus represent regionsof limited haplotype diversity and high LD. The map can therefore beused to define haplotype blocks/LD blocks as regions flanked byrecombination hotspots. As used herein, the terms “haplotype block” or“LD block” includes blocks defined by any of the above describedcharacteristics, or other alternative methods used by the person skilledin the art to define such regions.

It has thus become apparent that for any given observed association to apolymorphic marker in the genome, it is likely that additional markersin the genome also show association. This is a natural consequence ofthe uneven distribution of LD across the genome, as observed by thelarge variation in recombination rates. The markers used to detectassociation thus in a sense represent “tags” for a genomic region (i.e.,a haplotype block or LD block) that is associating with a given diseaseor trait. One or more causative (functional) variants or mutations mayreside within the region found to be associating to the disease ortrait. The functional variant may be another SNP, a tandem repeatpolymorphism (such as a minisatellite or a microsatellite), atransposable element, or a copy number variation, such as an inversion,deletion or insertion. Such variants in LD with the variants describedherein may confer a higher relative risk (RR) or odds ratio (OR) thanobserved for the tagging markers used to detect the association. Thepresent invention thus refers to the markers used for detectingassociation to the disease, as described herein, as well as markers inlinkage disequilibrium with the markers. Thus, in certain embodiments ofthe invention, markers that are in LD with the markers and/or haplotypesof the invention, as described herein, may be used as surrogate markers.The surrogate markers have in one embodiment relative risk (RR) and/orodds ratio (OR) values smaller than for the markers or haplotypesinitially found to be associating with the disease, as described herein.In other embodiments, the surrogate markers have RR or OR values greaterthan those initially determined for the markers initially found to beassociating with the disease, as described herein. An example of such anembodiment would be a rare, or relatively rare (<10% allelic populationfrequency) variant in LD with a more common variant (>10% populationfrequency) initially found to be associating with the disease, such asthe variants described herein. Identifying and using such markers fordetecting the association discovered by the inventors as describedherein can be performed by routine methods well known to the personskilled in the art, and are therefore within the scope of the presentinvention.

Determination of Haplotype Frequency

The frequencies of haplotypes in patient and control groups can beestimated using an expectation-maximization algorithm (Dempster A. etal., J. R. Stat. Soc. B, 39:1-38 (1977)). An implementation of thisalgorithm that can handle missing genotypes and uncertainty with thephase can be used. Under the null hypothesis, the patients and thecontrols are assumed to have identical frequencies. Using a likelihoodapproach, an alternative hypothesis is tested, where a candidateat-risk-haplotype, which can include the markers described herein, isallowed to have a higher frequency in patients than controls, while theratios of the frequencies of other haplotypes are assumed to be the samein both groups. Likelihoods are maximized separately under bothhypotheses and a corresponding 1-df likelihood ratio statistic is usedto evaluate the statistical significance.

To look for at-risk and protective markers and haplotypes within asuceptibility region, for example, within an LD block region,association of all possible combinations of genotyped markers within theregion is studied. The combined patient and control groups can berandomly divided into two sets, equal in size to the original group ofpatients and controls. The marker and haplotype analysis is thenrepeated and the most significant p-value registered is determined. Thisrandomization scheme can be repeated, for example, over 100 times toconstruct an empirical distribution of p-values. In a preferredembodiment, a p-value of <0.05 is indicative of an significant markerand/or haplotype association.

Haplotype Analysis

One general approach to haplotype analysis involves usinglikelihood-based inference applied to NEsted MOdels (Gretarsdottir S.,et al., Nat. Genet. 35:131-38 (2003)). The method is implemented in theprogram NEMO, which allows for many polymorphic markers, SNPs andmicrosatellites. The method and software are specifically designed forcase-control studies where the purpose is to identify haplotype groupsthat confer different risks. It is also a tool for studying LDstructures. In NEMO, maximum likelihood estimates, likelihood ratios andp-values are calculated directly, with the aid of the EM algorithm, forthe observed data treating it as a missing-data problem.

Even though likelihood ratio tests based on likelihoods computeddirectly for the observed data, which have captured the information lossdue to uncertainty in phase and missing genotypes, can be relied on togive valid p-values, it would still be of interest to know how muchinformation had been lost due to the information being incomplete. Theinformation measure for haplotype analysis is described in Nicolae andKong (Technical Report 537, Department of Statistics, University ofStatistics, University of Chicago; Biometrics, 60(2):368-75 (2004)) as anatural extension of information measures defined for linkage analysis,and is implemented in NEMO.

For single marker association to a disease, the Fisher exact test can′be used to calculate two-sided p-values for each individual allele.Usually, all p-values are presented unadjusted for multiple comparisonsunless specifically indicated. The presented frequencies (formicrosatellites, SNPs and haplotypes) are allelic frequencies as opposedto carrier frequencies. To minimize any bias due the relatedness of thepatients who were recruited as families, first and second-degreerelatives can be eliminated from the patient list. Furthermore, the testcan be repeated for association correcting for any remaining relatednessamong the patients, by extending a variance adjustment procedurepreviously described (Risch, N. & Teng, J. (Genome Res., 8:1273-1288(1998)) for sibships so that it can be applied to general familialrelationships, and present both adjusted and unadjusted p-values forcomparison. The method of genomic controls (Devlin, B. & Roeder, K.Biometrics 55:997 (1999)) can also be used to adjust for the relatednessof the individuals and possible stratification. The differences are ingeneral very small as expected. To assess the significance ofsingle-marker association corrected for multiple testing we can carryout a randomization test using the same genotype data. Cohorts ofpatients and controls can be randomized and the association analysisredone multiple times (e.g., up to 500,000 times) and the p-value is thefraction of replications that produced a p-value for some marker allelethat is lower than or equal to the p-value we observed using theoriginal patient and control cohorts.

For both single-marker and haplotype analyses, relative risk (RR) andthe population attributable risk (PAR) can be calculated assuming amultiplicative model (haplotype relative risk model) (Terwilliger, J. D.& Ott, J., Hum. Hered. 42:337-46 (1992) and Falk, C. T. & Rubinstein, P,Ann. Hum. Genet. 51 (Pt 3):227-33 (1987)), i.e., that the risks of thetwo alleles/haplotypes a person carries multiply. For example, if RR isthe risk of A relative to a, then the risk of a person homozygote AAwill be RR times that of a heterozygote Aa and RR² times that of ahomozygote aa. The multiplicative model has a nice property thatsimplifies analysis and computations—haplotypes are independent, i.e.,in Hardy-Weinberg equilibrium, within the affected population as well aswithin the control population. As a consequence, haplotype counts of theaffecteds and controls each have multinomial distributions, but withdifferent haplotype frequencies under the alternative hypothesis.Specifically, for two haplotypes, h_(i) and h_(j),risk(h_(i))/risk(h_(j))=(f_(i)/p_(i))/(f_(j)/p_(j)), where f and pdenote, respectively, frequencies in the affected population and in thecontrol population. While there is some power loss if the true model isnot multiplicative, the loss tends to be mild except for extreme cases.Most importantly, p-values are always valid since they are computed withrespect to null hypothesis.

An association signal detected in one association study may bereplicated in a second cohort, ideally from a different population(e.g., different region of same country, or a different country) of thesame or different ethnicity. The advantage of replication studies isthat the number of tests performed in the replication study, and hencethe less stringent the statistical measure that is applied. For example,for a genome-wide search for susceptibility variants for a particulardisease or trait using 300,000 SNPs, a correction for the 300,000 testsperformed (one for each SNP) can be performed. Since many SNPs on thearrays typically used are correlated (i.e., in LD), they are notindependent. Thus, the correction is conservative. Nevertheless,applying this correction factor requires an observed P-value of lessthan 0.05/300,000=1.7×10⁻⁷ for the signal to be considered significantapplying this conservative test on results from a single study cohort.Obviously, signals found in a genome-wide association study withP-values less than this conservative threshold are a measure of a truegenetic effect, and replication in additional cohorts is not necessarilyfrom a statistical point of view. However, since the correction factordepends on the number of statistical tests performed, if one signal (oneSNP) from an initial study is replicated in a second case-controlcohort, the appropriate statistical test for significance is that for asingle statistical test, i.e., P-value less than 0.05. Replicationstudies in one or even several additional case-control cohorts have theadded advantage of providing assessment of the association signal inadditional populations, thus simultaneously confirming the initialfinding and providing an assessment of the overall significance of thegenetic variant(s) being tested in human populations in general.

The results from several case-control cohorts can also be combined toprovide an overall assessment of the underlying effect. The methodologycommonly used to combine results from multiple genetic associationstudies is the Mantel-Haenszel model (Mantel and Haenszel, J Natl CancerInst 22:719-48 (1959)). The model is designed to deal with the situationwhere association results from different populations, with each possiblyhaving a different population frequency of the genetic variant, arecombined. The model combines the results assuming that the effect of thevariant on the risk of the disease, a measured by the OR or RR, is thesame in all populations, while the frequency of the variant may differbetween the poplations. Combining the results from several populationshas the added advantage that the overall power to detect a realunderlying association signal is increased, due to the increasedstatistical power provided by the combined cohorts. Furthermore, anydeficiencies in individual studies, for example due to unequal matchingof cases and controls or population stratification will tend to balanceout when results from multiple cohorts are combined, again providing abetter estimate of the true underlying genetic effect.

Risk Assessment and Diagnostics

Within any given population, there is an absolute risk of developing adisease or trait, defined as the chance of a person developing thespecific disease or trait over a specified time-period. For example, awoman's lifetime absolute risk of breast cancer is one in nine. That isto say, one woman in every nine will develop breast cancer at some pointin their lives. Risk is typically measured by looking at very largenumbers of people, rather than at a particular individual. Risk is oftenpresented in terms of Absolute Risk (AR) and Relative Risk (RR).Relative Risk is used to compare risks associating with two variants orthe risks of two different groups of people. For example, it can be usedto compare a group of people with a certain genotype with another grouphaving a different genotype. For a disease, a relative risk of 2 meansthat one group has twice the chance of developing a disease as the othergroup. The Risk presented is usually the relative risk for a person, ora specific genotype of a person, compared to the population with matchedgender and ethnicity. Risks of two individuals of the same gender andethnicity could be compared in a simple manner. For example, if,compared to the population, the first individual has relative risk 1.5and the second has relative risk 0.5, then the risk of the firstindividual compared to the second individual is 1.5/0.5=3.

As described herein, certain polymorphic markers and haplotypescomprising such markers are found to be useful for risk assessment ofbreast cancer. Risk assessment can involve the use of the markers fordiagnosing a susceptibility to breast cancer. Particular alleles ofpolymorphic markers are found more frequently in individuals with breastcancer, than in individuals without diagnosis of breast cancer.Therefore, these marker alleles have predictive value for detectingbreast cancer, or a susceptibility to breast cancer, in an individual.Tagging markers within haplotype blocks or LD blocks comprising at-riskmarkers, such as the markers of the present invention, can be used assurrogates for other markers and/or haplotypes within the haplotypeblock or LD block. Markers with values of r² equal to 1 are perfectsurrogates for the at-risk variants, i.e. genotypes for one markerperfectly predicts genotypes for the other. Markers with smaller valuesof r² than 1 can also be surrogates for the at-risk variant, oralternatively represent variants with relative risk values as high orpossibly even higher than the at-risk variant. The at-risk variantidentified may not be the functional variant itself, but is in thisinstance in linkage disequilibrium with the true functional variant. Thefunctional variant may for example be a tandem repeat, such as aminisatellite or a microsatellite, a transposable element (e.g., an A/uelement), or a structural alteration, such as a deletion, insertion orinversion (sometimes also called copy number variations, or CNVs). Thepresent invention encompasses the assessment of such surrogate markersfor the markers as disclosed herein. Such markers are annotated, mappedand listed in public databases, as well known to the skilled person, orcan alternatively be readily identified by sequencing the region or apart of the region identified by the markers of the present invention ina group of individuals, and identify polymorphisms in the resultinggroup of sequences. As a consequence, the person skilled in the art canreadily and without undue experimentation genotype surrogate markers inlinkage disequilibrium with the markers and/or haplotypes as describedherein. The tagging or surrogate markers in LD with the at-risk variantsdetected, also have predictive value for detecting association to breastcancer, or a susceptibility to breast cancer, in an individual. Thesetagging or surrogate markers that are in LD with the markers of thepresent invention can also include other markers that distinguish amonghaplotypes, as these similarly have predictive value for detectingsusceptibility to breast cancer.

The present invention can in certain embodiments be practiced byassessing a sample comprising genomic DNA from an individual for thepresence of variants described herein to be associated with breastcancer. Such assessment includes steps of detecting the presence orabsence of at least one allele of at least one polymorphic marker, usingmethods well known to the skilled person and further described herein,and based on the outcome of such assessment, determine whether theindividual from whom the sample is derived is at increased or decreasedrisk (increased or decreased susceptibility) of breast cancer.Alternatively, the invention can be practiced utilizing a datasetcomprising information about the genotype status of at least onepolymorphic marker described herein to be associated with breast cancer(or markers in linkage disequilibrium with at least one marker shownherein to be associated with breast cancer). In other words, a datasetcontaining information about such genetic status, for example in theform of genotype counts at a certain polymorphic marker, or a pluralityof markers (e.g., an indication of the presence or absence of certainat-risk alleles), or actual genotypes for one or more markers, can bequeried for the presence or absence of certain at-risk alleles atcertain polymorphic markers shown by the present inventors to beassociated with breast cancer. A positive result for a variant (e.g.,marker allele) associated with increased risk of breast cancer, as shownherein, is indicative of the individual from which the dataset isderived is at increased susceptibility (increased risk) of breastcancer.

In certain embodiments of the invention, a polymorphic marker iscorrelated to breast cancer by referencing genotype data for thepolymorphic marker to a look-up table that comprises correlationsbetween at least one allele of the polymorphism and breast cancer. Insome embodiments, the table comprises a correlation for onepolymorhpism. In other embodiments, the table comprises a correlationfor a plurality of polymorhpisms. In both scenarios, by referencing to alook-up table that gives an indication of a correlation between a markerand breast cancer, a risk for breast cancer, or a susceptibility tobreast cancer, can be identified in the individual from whom the sampleis derived. In some embodiments, the correlation is reported as astatistical measure. The statistical measure may be reported as a riskmeasure, such as a relative risk (RR), an absolute risk (AR) or an oddsratio (OR).

The markers of the invention, e.g., polymorphic markers on Chromosome5p12 and Chromosome 10q26, e.g., the markers presented in Tables 12, 13and 14, e.g., markers marker rs7703618, rs4415084, rs2067980,rs10035564, rs11743392, rs7716600, rs10941679, rs1219648, may be usefulfor risk assessment and diagnostic purposes for, either alone or incombination. Thus, even in cases where the increase in risk byindividual markers is relatively modest, i.e. on the order of 10-30%,the association may have significant implications. Thus, relativelycommon variants may have significant contribution to the overall risk(Population Attributable Risk is high), or combination of markers can beused to define groups of individual who, based on the combined risk ofthe markers, is at significant combined risk of developing the disease.

For example, combined risk can be assessed based on genotype results formarkers on chromosome 5p12 and chromosome 10q26, such as markerrs10941679 and marker rs1219648. Alternatively, markers in LD witheither of these markers could be assessed. Other markers known to conferrisk of breast cancer can also be assessed together with the markersdescribed herein, such as markers on chromosome 2q14 (e.g., markerrs4848543 or markers in linkage disequilibrium therewith), 2q35 (e.g.,marker rs13387042, or markers in linkage disequilibrium therewith), andchromosome 16 (e.g., marker rs3803662, or markers in linkagedisequilibrium therewith).

Thus, in one embodiment of the invention, a plurality of variants(markers and/or haplotypes) is used for overall risk assessment. Thesevariants are in one embodiment selected from the variants as disclosedherein. Other embodiments include the use of the variants of the presentinvention in combination with other variants known to be useful fordiagnosing a susceptibility to breast cancer. Results for any two ormore markers can be combined in such analysis, such as results for threemarkers, four markers, five markers, six markers, seven markers, eightmarkers, nine markers, or ten or more markers. In such embodiments, thegenotype status of a plurality of markers and/or haplotypes isdetermined in an individual, and the status of the individual comparedwith the population frequency of the associated variants, or thefrequency of the variants in clinically healthy subjects, such asage-matched and sex-matched subjects. Methods known in the art, such asmultivariate analyses or joint risk analyses, may subsequently be usedto determine the overall risk conferred based on the genotype status atthe multiple loci. Assessment of risk based on such analysis maysubsequently be used in the methods and kits of the invention, asdescribed herein.

As described in the above, the haplotype block structure of the humangenome has the effect that a large number of variants (markers and/orhaplotypes) in linkage disequilibrium with the variant originallyassociated with a disease or trait may be used as surrogate markers forassessing association to the disease or trait. The number of suchsurrogate markers will depend on factors such as the historicalrecombination rate in the region, the mutational frequency in the region(i.e., the number of polymorphic sites or markers in the region), andthe extent of LD (size of the LD block) in the region. These markers areusually located within the physical boundaries of the LD block orhaplotype block in question as defined using the methods describedherein, or by other methods known to the person skilled in the art.However, sometimes marker and haplotype association is found to extendbeyond the physical boundaries of the haplotype block as defined. Suchmarkers and/or haplotypes may in those cases be also used as surrogatemarkers and/or haplotypes for the markers and/or haplotypes physicallyresiding within the haplotype block as defined. As a consequence,markers and haplotypes in LD (typically characterized by r² greater than0.1, such as r² greater than 0.2, including r² greater than 0.3, alsoincluding r² greater than 0.4) with the markers and haplotypes of thepresent invention are also within the scope of the invention, even ifthey are physically located beyond the boundaries of the haplotype blockas defined. This includes markers that are described herein (e.g., Table1 and Table 3, e.g. rs4415084, rs10941679, rs1219648), but may alsoinclude other markers that are in strong LD (characterized by r² greaterthan 0.1 or 0.2 and/or |D′|>0.8) with one or more of the markers listedin Table 1 and Table 3.

For the SNP markers described herein, the opposite allele to the allelefound to be in excess in patients (at-risk allele) is found in decreasedfrequency in breast cancer. These markers and haplotypes in LD and/orcomprising such markers, are thus protective for breast cancer, i.e.they confer a decreased risk or susceptibility of individuals carryingthese markers and/or haplotypes developing breast cancer.

Certain variants of the present invention, including certain haplotypescomprise, in some cases, a combination of various genetic markers, e.g.,SNPs and microsatellites. Detecting haplotypes can be accomplished bymethods known in the art and/or described herein for detecting sequencesat polymorphic sites. Furthermore, correlation between certainhaplotypes or sets of markers and disease phenotype, can be verifiedusing standard techniques. A representative example of a simple test forcorrelation would be a Fisher-exact test on a two by two table.

In specific embodiments, a marker allele or haplotype found to beassociated with breast cancer, (e.g., marker alleles as listed in Table1 and Table 3, the markers as listed in Tables 12, 13 and 14, SEQ IDNO:1-237) is one in which the marker allele or haplotype is morefrequently present in an individual at risk for breast cancer(affected), compared to the frequency of its presence in a healthyindividual (control), wherein the presence of the marker allele orhaplotype is indicative of breast cancer or a susceptibility to breastcancer. In other embodiments, at-risk markers in linkage disequilibriumwith one or more markers found to be associated with breast cancer aretagging markers that are more frequently present in an individual atrisk for breast cancer (affected), compared to the frequency of theirpresence in a healthy individual (control), wherein the presence of thetagging markers is indicative of increased susceptibility to breastcancer. In a further embodiment, at-risk markers alleles (i.e.conferring increased susceptibility) in linkage disequilibrium with oneor more markers found to be associated with breast cancer, are markerscomprising one or more allele that is more frequently present in anindividual at risk for breast cancer, compared to the frequency of theirpresence in a healthy individual (control), wherein the presence of themarkers is indicative of increased susceptibility to breast cancer.

Study Population

In a general sense, the methods and kits of the invention can beutilized from samples containing genomic DNA from any source, i.e. anyindividual. In preferred embodiments, the individual is a humanindividual. The individual can be an adult, child, or fetus. The presentinvention also provides for assessing markers and/or haplotypes inindividuals who are members of a target population. Such a targetpopulation is in one embodiment a population or group of individuals atrisk of developing the disease, based on other genetic factors,biomarkers, biophysical parameters (e.g., weight, BMD, blood pressure),or general health and/or lifestyle parameters (e.g., history of cancer,history of breast cancer, previous diagnosis of disease, family historyof cancer, family history of breast cancer).

The invention provides for embodiments that include individuals fromspecific age subgroups, such as those over the age of 40, over age of45, or over age of 50, 55, 60, 65, 70, 75, 80, or 85. Other embodimentsof the invention pertain to other age groups, such as individuals agedless than 85, such as less than age 80, less than age 75, or less thanage 70, 65, 60, 55, 50, 45, 40, 35, or age 30. Other embodiments relateto individuals with age at onset of the disease in any of the age rangesdescribed in the above. It is also contemplated that a range of ages maybe relevant in certain embodiments, such as age at onset at more thanage 45 but less than age 60. Other age ranges are however alsocontemplated, including all age ranges bracketed by the age valueslisted in the above. The invention furthermore relates to individuals ofeither sex, males or females. In some embodiments, it relates toassessment of male subjects. In preferred embodiments, it relates toassessment of female subjects.

The Icelandic population is a Caucasian population of Northern Europeanancestry. A large number of studies reporting results of genetic linkageand association in the Icelandic population have been published in thelast few years. Many of those studies show replication of variants,originally identified in the Icelandic population as being associatingwith a particular disease, in other populations (Styrkarsdottir, U., etal. N Engl J Med Apr. 29 2008 (Epub ahead of print); Thorgeirsson, T.,et al. Nature 452:638-42 (2008); Gudmundsson, J., et al. Nat Genet.40:281-3 (2008); Stacey, S. N., et al., Nat Genet. 39:865-69 (2007);Helgadottir, A., et al., Science 316:1491-93 (2007); Steinthorsdottir,V., et al., Nat Genet. 39:770-75 (2007); Gudmundsson, J., et al., NatGenet. 39:631-37 (2007); Frayling, T M, Nature Reviews Genet 8:657-662(2007); Amundadottir, L. T., et al., Nat Genet. 38:652-58 (2006); Grant,S. F., et al., Nat Genet. 38:320-23 (2006)). Thus, genetic findings inthe Icelandic population have in general been replicated in otherpopulations, including populations from Africa and Asia.

The markers of the present invention found to be associated with breastcancer are believed to show similar association in other humanpopulations. Particular embodiments comprising individual humanpopulations are thus also contemplated and within the scope of theinvention. Such embodiments relate to human subjects that are from oneor more human population including, but not limited to, Caucasianpopulations, European populations, American populations, Eurasianpopulations, Asian populations, Central/South Asian populations, EastAsian populations, Middle Eastern populations, African populations,Hispanic populations, and Oceanian populations. European populationsinclude, but are not limited to, Swedish, Norwegian, Finnish, Russian,Danish, Icelandic, Irish, Kelt, English, Scottish, Dutch, Belgian,French, German, Spanish, Portugues, Italian, Polish, Bulgarian, Slavic,Serbian, Bosnian, Czech, Greek and Turkish populations. The inventionfurthermore in other embodiments can be practiced in specific humanpopulations that include Bantu, Mandenk, Yoruba, San, Mbuti Pygmy,Orcadian, Adygel, Russian, Sardinian, Tuscan, Mozabite, Bedouin, Druze,Palestinian, Balochi, Brahui, Makrani, Sindhi, Pathan, Burusho, Hazara,Uygur, Kalash, Han, Dai, Daur, Hezhen, Lahu, Miao, Orogen, She, Tujia,Tu, Xibo, Yi, Mongolan, Naxi, Cambodian, Japanese, Yakut, Melanesian,Papuan, Karitianan, Surui, Colmbian, Maya and Pima.

In certain embodiments, the invention relates to populations thatinclude black African ancestry such as populations comprising persons ofAfrican descent or lineage. Black African ancestry may be determined byself reporting as African-Americans, Afro-Americans, Black Americans,being a member of the black race or being a member of the negro race.For example, African Americans or Black Americans are those personsliving in North America and having origins in any of the black racialgroups of Africa. In another example, self-reported persons of blackAfrican ancestry may have at least one parent of black African ancestryor at least one grandparent of black African ancestry.

The racial contribution in individual subjects may also be determined bygenetic analysis. Genetic analysis of ancestry may be carried out usingunlinked microsatellite markers such as those set out in Smith et al.(Am J Hum Genet 74, 1001-13 (2004)).

In certain embodiments, the invention relates to markers and/orhaplotypes identified in specific populations, as described in theabove. The person skilled in the art will appreciate that measures oflinkage disequilibrium (LD) may give different results when applied todifferent populations. This is due to different population history ofdifferent human populations as well as differential selective pressuresthat may have led to differences in LD in specific genomic regions. Itis also well known to the person skilled in the art that certainmarkers, e.g. SNP markers, are polymorphic in one population but not inanother. The person skilled in the art will however apply the methodsavailable and as tought herein to practice the present invention in anygiven human population. This may include assessment of polymorphicmarkers in the LD region of the present invention, so as to identifythose markers that give strongest association within the specificpopulation. Thus, the at-risk variants of the present invention mayreside on different haplotype background and in different frequencies invarious human populations. However, utilizing methods known in the artand the markers of the present invention, the invention can be practicedin any given human population.

Models to Predict Inherited Risk for Breast Cancer

The goal of breast cancer risk assessment is to provide a rationalframework for the development of personalized medical managementstrategies for all women with the aim of increasing survival and qualityof life in high-risk women while minimizing costs, unnecessaryinterventions and anxiety in women at lower risk. Risk prediction modelsattempt to estimate the risk for breast cancer in an individual who hasa given set of congenital risk characteristics (e.g., family history,prior benign breast lesion, previous breast tumor). The breast cancerrisk assessment models most commonly employed in clinical practiceestimate inherited risk factors by considering family history. The riskestimates are based on the observations of increased risk to individualswith one or more close relatives previously diagnosed with breastcancer. They do not take into account complex pedigree structures. Thesemodels have the further disadvantage of not being able to differentiatebetween carriers and non-carriers of genes with breast cancerpredisposing mutations.

More sophisticated risk models have better mechanisms to deal withspecific family histories and have an ability to take into accountcarrier status for BRCA1 and BRCA2 mutations. For example, the Breastand Ovarian Analysis of Disease Incidence and Carrier EstimationAlgorithm (BOADICEA) (Antoniou et al., 2004) takes into account familyhistory based on individual pedigree structures through the pedigreeanalysis program MENDEL. Information on known BRCA1 and BRCA2 status isalso taken into account. The main limitations of the BOADICEA and allother breast cancer risk models currently in use are that they do notincorporate genotypic information from other predisposition genes.Current models depend strongly on family history to act as a surrogateto compensate for the lack of knowledge of non-BRCA genetic determinantsof risk. Therefore the available models are limited to situations wherethere is a known family history of disease. Lower penetrance breastcancer predisposition genes may be relatively common in the populationand may not show such strong tendencies to drive familial clustering asdo the BRCA1 and BRCA2 genes. Patients with a relatively high geneticload of predisposition alleles may show little or no family history ofdisease. There is a need therefore to construct models which incorporateinherited susceptibility data obtained directly through gene-basedtesting. In addition to making the models more precise, this will reducethe dependency on family history parameters and assist in the extensionof the risk profiling into the wider at-risk population where familyhistory is not such a key factor.

Integration of Improved Genetic Risk Models into Clinical Management ofBreast Cancer Primary Prevention

Clinical primary prevention options currently can be classified aschemopreventative (or hormonal) treatments and prophylactic surgery.Patients identified as high risk can be prescribed long-term courses ofchemopreventative therapies. This concept is well accepted in the fieldof cardiovascular medicine, but is only now beginning to make an impactin clinical oncology. The most widely used oncology chemopreventative isTamoxifen, a Selective Estrogen Receptor Modulator (SERM). Initiallyused as an adjuvant therapy directed against breast cancer recurrence,Tamoxifen now has proven efficacy as a breast cancer preventative agent[Cuzick, et al., (2003), Lancet, 361, 296-300][Martino, et al., (2004),Oncologist, 9, 116-25]. The FDA has approved the use of Tamoxifen as achemopreventative agent in certain high risk women.

Unfortunately, long term Tamoxifen use increases risks for endometrialcancer approximately 2.5-fold, the risk of venous thrombosisapproximately 2.0-fold. Risks for pulmonary embolism, stroke, andcataracts are also increased [Cuzick, et al., (2003), Lancet, 361,296-300]. Accordingly, the benefits in Tamoxifen use for reducing breastcancer incidence may not be easily translated into correspondingdecreases in overall mortality. Another SERM called Raloxifene may bemore efficacious in a preventative mode, and does not carry the samerisks for endometrial cancer. However risk for thrombosis is stillelevated in patients treated long-term with Raloxifene[Cuzick, et al.,(2003), Lancet, 361, 296-300; Martino, et al., (2004), Oncologist, 9,116-25]. Moreover, both Tamoxifen and Raloxifene have quality of lifeissues associated with them. To make a rational risk:benefit analysis ofSERM therapy in a chemopreventative mode, there is a clinical need toidentify individuals who are most at risk for breast cancer. Given thata substantial proportion of risk for breast cancer is genetic, there isa clear clinical need for genetic tests to quantify individuals' risksin this context. One can anticipate similar issues arising from anyfuture cancer chemo-preventative therapies that may become available,such as the aromatase inhibitors. Moreover, as chemopreventativetherapies become safer, there is an increased need to identify patientswho are genetically predisposed, but do not have massively elevatedrisks associated with BRCA1 & 2 mutation carriers.

Patients who are identified as being at high risk for breast cancer areconsidered for prophylactic surgery; either bilateral mastectomy oroophorectomy or both. Clearly such drastic treatments are recommendedonly for patients who are perceived to be at extremely high risk. Inpractice, such risks can currently be identified only in individuals whocarry mutations in BRCA1, BRCA2 or genes known to be involved in rarebreast cancer predisposition syndromes like p53 in Li-Fraumeni Syndrome,PTEN in Cowden's Syndrome.

Estimates of the penetrance of BRCA1 and BRCA2 mutations tend to behigher when they are derived from multiple-case families than when theyare derived from population-based estimates. This is because differentmutation-carrying families exhibit different penetrances for breastcancer (see [Thorlacius, et al., (1997), Am J Hum Genet, 60, 1079-84]for example). One of the major factors contributing to this variation isthe action of as yet unknown predisposition genes whose effects modifythe penetrance of BRCA1 and BRCA2 mutations. Therefore the absolute riskto an individual who carries a mutation in the BRCA1 or BRCA2 genescannot be accurately quantified in the absence of knowledge of theexistence and action of modifying genes. Since the treatment options forBRCA1 and BRCA2 carriers can be severe, it is important in this contextto quantify the risks to individual BRCA carriers with the greatestaccuracy possible. There is a need, therefore, to identifypredisposition genes whose effects modify the penetrance of breastcancer in BRCA1 and BRCA2 carriers and to develop improved riskassessment models based on these genes.

Furthermore, there are individuals who are perceived to be at very highrisk for breast cancer, perhaps because of a strong family history ofbreast cancer, but in whom no mutations in known predisposition genescan be identified. Consideration of prophylactic surgery is difficult insuch cases because one cannot test the individual to discover whether ornot she has inherited a high penetrance predisposition gene.Accordingly, the individual's risk cannot be assessed accurately. Thereis a clear clinical need, therefore, to identify any high penetrancepredisposition genes that remain undiscovered and to develop associatedgenetic tests for use in primary prevention strategies. Such genes mayfor example be the genes disclosed herein to be associated with risk ofbreat cancer (e.g., the FGF10, MRPS30 and/or FGFR2 genes). Although thevariants shown herein to be associated with risk of breast cancer arefairly common, and conferring a relatively low risk of breast cancer, itis quite possible that higher risk variants exist within one or more ofthese genes. It is thus contemplated that high-risk genetic variantswithin, or associated with, one or more of the FGF10, MRPS30 and/orFGFR2 genes could be useful for determining whether an individual is acarrier of a high risk (and high penetrance) genetic factor for breastcancer.

Early Diagnosis

Clinical screening for breast cancer in most western countries consistsof periodic clinical breast examination (CBE) and X-ray mammography.There is good evidence to indicate that CBE has little added benefitwhen used in the context of a good mammographic screening program. Inthe United Kingdom, women between the ages of 50 and 70 are invited toundergo screening mammography every three years. The situation in theUnited States varies depending on healthcare provider, however theAmerican Cancer Society recommends annual mammographic screening fromage 40. Mammographic screening has proven effectiveness in reducingmortality amongst screened women over the age of 50.

It is unlikely that genetic testing would ever be employed as a means ofreducing access to existing mammographic screening programs. However,mammographic screening is not without shortcomings and it is conceivablethat genetic testing should be used to select people for augmentedscreening programs. One of the drawbacks of mammographic screening isthat is has thus far not been possible to demonstrate a significanteffect on improved survival for women screened under 50 years of age.

One reason that mammography is less effective in women under 50 may bethat the density of breast tissue is higher in younger women, makingmammographic detection of tumors more difficult. However, breast cancersin predisposed individuals tend to occur at early ages groups and thereis a clear association between high breast density and breast cancerrisk. Therefore there is a problem with simple increases in mammographicscreening for individuals with high predisposition because they would bemanaged by a technique that performs sub-optimally in the group athighest risk. Recent studies have shown that contrast-enhanced magneticresonance imaging (CE-MRI) is more sensitive and detects tumors at anearlier stage in this high-risk group than mammographic screening does[Warner, et al., (2004), Jama, 292, 1317-25; Leach, et al., (2005),Lancet, 365, 1769-78]. CE-MRI strategies work particularly well whenused in combination with routine X-ray mammography[Leach, et al.,(2005), Lancet, 365, 1769-78]. Because CE-MRI requires specialistcenters that incur high costs, screening of under-50's must berestricted to those individuals at the highest risk. Present CE-MRItrials restrict entry to those individuals with BRCA1, BRCA2 or p53mutations or very strong family histories of disease. The extension ofthis screening modality to a wider range of high-risk patients would begreatly assisted by the provision of gene-based risk profiling tools.

There is good evidence to support the notion that early-onset breastcancers and cancers occurring in genetically predisposed women growfaster than cancers in older, less strongly predisposed women. Thiscomes from observations of higher rates of interval cancers in youngerwomen, that is, cancers that arise in the intervals between screeningvisits in a well-screened population are higher amongst younger women.Therefore there are suggestions that screening intervals, by whatevermethod, should be reduced for younger women. There is a paradox here inthat more frequent screening using more expensive methodologies seems tobe required for an age group in which the overall rates, f breast cancerare comparatively low. There is a clear clinical need here to identifythose young individuals who are most strongly predisposed to develop thedisease early, and channel them into more expensive and extensivescreening regimes. The variants disclosed herein to confer risk ofbreast cancer can be useful for identification of individuals who are atparticularly high risk of developing breast cancer. Such individuals arelikely to most benefit from early and aggressive screening programs, soas to maximizing the likelihood of early identification of the cancer.

Treatment

Currently, primary breast cancer is treated by surgery, adjuvantchemotherapy, radiotherapy, followed by long term hormonal therapy.Often combinations of three or four therapies are used.

Breast cancer patients with the same stage of disease can have verydifferent responses to adjuvant chemotherapy resulting in a broadvariation in overall treatment outcomes. Consensus guidelines (the StGalen and NIH criteria) have been developed for determining theeligibility of breast cancer patients for adjuvant chemotherapytreatment. However, even the strongest clinical and histologicalpredictors of metastasis fail to predict accurately the clinicalresponses of breast tumors [Goldhirsch, et al., (1998), J Natl CancerInst, 90, 1601-8; Eifel, et al., (2001), J Natl Cancer Inst, 93,979-89]. Chemotherapy or hormonal therapy reduces the risk of metastasisonly by approximately ⅓, however 70-80% of patients receiving thistreatment would have survived without it. Therefore the majority ofbreast cancer patients are currently offered treatment that is eitherineffective or unnecessary. There is a clear clinical need forimprovements in the development of prognostic measures which will allowclinicians to tailor treatments more appropriately to those who willbest benefit. It is reasonable to expect that profiling individuals forgenetic predisposition may reveal information relevant to theirtreatment outcome and thereby aid in rational treatment planning. Themarkers of the present invention, conferring risk of breast cancer, arecontemplated to be useful in this context.

Several previous studies exemplify this concept: Breast cancer patientswho are BRCA mutation carriers appear to show better clinical responserates and survival when treated with adjuvant chemotherapies [Chappuis,et al., (2002), J Med Genet, 39, 608-10; Goffin, et al., (2003), Cancer,97, 527-36]. BRCA mutation carriers demonstrate improved responses toplatinum chemotherapy for ovarian cancer than non-carriers [Cass, etal., (2003), Cancer, 97, 2187-95]. Similar considerations may apply topredisposed patients in whom the genes involved are not known. Forexample, infiltrating lobular breast carcinoma (ILBC) is known to have astrong familial component but the genetic variants involved have not yetbeen identified. Patients with ILBC demonstrate poorer responses tocommon chemotherapy regimes [Mathieu, et al., (2004), Eur J Cancer, 40,342-51].

Genetic predisposition models may not only aid in the individualizationof treatment strategies, but may play an integral role in the design ofthese strategies. For example, BRCA1 and BRCA2 mutant tumor cells havebeen found to be profoundly sensitive to poly (ADP-ribose) polymerase(PARP) inhibitors as a result of their defective DNA repair pathway[Farmer, et al., (2005), Nature, 434, 917-21]. This has stimulateddevelopment of small molecule drugs targeted on PARP with a view totheir use specifically in BRCA carrier patients. From this example it isclear that knowledge of genetic predisposition may identify drug targetsthat lead to the development of personalized chemotherapy regimes to beused in combination with genetic risk profiling. Similarly, the markersof the present invention may aid in the identification of novel drugsthat target, for example, one or more of the FGF10, MRPS30 and/or FGFR2genes.

Cancer chemotherapy has well known, dose-limiting side effects on normaltissues particularly the highly proliferative hemopoetic and gutepithelial cell compartments. It can be anticipated thatgenetically-based individual differences exist in sensitivities ofnormal tissues to cytotoxic drugs. An understanding of these factorsmight aid in rational treatment planning and in the development of drugsdesigned to protect normal tissues from the adverse effects ofchemotherapy.

Genetic profiling may also contribute to improved radiotherapyapproaches: Within groups of breast cancer patients undergoing standardradiotherapy regimes, a proportion of patients will experience adversereactions to doses of radiation that are normally tolerated. Acutereactions include erythema, moist desquamation, edema and radiationpneumatitis. Long term reactions including telangiectasia, edema,pulmonary fibrosis and breast fibrosis may arise many years afterradiotherapy. Both acute and long-term reactions are considerablesources of morbidity and can be fatal. In one study, 87% of patientswere found to have some adverse side effects to radiotherapy while 11%had serious adverse reactions (LENT/SOMA Grade 3-4); [Hoeller, et al.,(2003), Int J Radiat Oncol Biol Phys, 55, 1013-8]. The probability ofexperiencing an adverse reaction to radiotherapy is due primarily toconstitutive individual differences in normal tissue reactions and thereis a suspicion that these have a strong genetic component. Several ofthe known breast cancer predisposition genes (e.g. BRCA1, BRCA2, ATM)affect pathways of DNA double strand break repair. DNA double strandbreaks are the primary cytotoxic lesion induced by radiotherapy. Thishas led to concern that individuals who are genetically predisposed tobreast cancer through carriage of variants in genes belonging to thesepathways might also be at higher risk of suffering excessive normaltissue damage from radiotherapy. It is contemplated that the geneticvariants described herein to confer risk of breast cancer, for examplethrough one or more of the FGF10, MRPS30 and/or FGFR2 genes, may beuseful for identifying individuals at particular risk of adversereaction to radiotherapy.

The existence of constitutively radiosensitive individuals in thepopulation means that radiotherapy dose rates for the majority of thepatient population must be restricted, in order to keep the frequency ofadverse reactions to an acceptable level. There is a clinical need,therefore, for reliable tests that can identify individuals who are atelevated risk for adverse reactions to radiotherapy. Such tests wouldindicate conservative or alternative treatments for individuals who areradiosensitive, while permitting escalation of radiotherapeutic dosesfor the majority of patients who are relatively radioresistant. It hasbeen estimated that the dose escalations made possible by a test totriage breast cancer patients simply into radiosensitive, intermediateand radioresistant categories would result in an approximately 35%increase in local tumor control and consequent improvements in survivalrates [Burnet, et al., (1996), Clin Oncol (R Coll Radiol), 8, 25-34].

Exposure to ionizing radiation is a proven factor contributing tooncogenesis in the breast [Dumitrescu and Cotarla, (2005), J Cell MolMed, 9, 208-21]. Known breast cancer predisposition genes encode pathwaycomponents of the cellular response to radiation-induced DNA damage[Narod and Foulkes, (2004), Nat Rev Cancer, 4, 665-76]. Accordingly,there is concern that the risk for second primary breast tumors may beincreased by irradiation of normal tissues within the radiotherapyfield. There does not appear to be any measurable increased risk forBRCA carriers from radiotherapy, however their risk for second primarytumors is already exceptionally high. There is evidence to suggest thatrisk for second primary tumors is increased in carriers in breast cancerpredisposing alleles of the ATM and CHEK2 genes who are treated withradiotherapy [Bernstein, et al., (2004), Breast Cancer Res, 6, R199-214;Broeks, et al., (2004), Breast Cancer Res Treat, 83, 91-3]. It isexpected that the risk of second primary tumors from radiotherapy (and,possibly, from intensive mammographic screening) will be better definedby obtaining accurate genetic risk profiles from patients during thetreatment planning stage.

Secondary Prevention

Approximately 30% of patients who are diagnosed with a stage 1 or 2breast cancer will experience either a loco-regional or distantmetastatic recurrence of their original tumor. Patients who have had aprimary breast cancer are also at greatly increased risk for beingdiagnosed with a second primary tumor, either in the contralateralbreast or in the ipsilateral breast when breast-conserving surgery hasbeen carried out. Secondary prevention refers to methods used to preventrecurrences or second primary tumors from developing. Methods currentlyin use comprise; long-term treatment with Tamoxifen or another SERMeither alone or alternated with an aromatase inhibitor, risk-reducingmastectomy of the contralateral breast, and risk-reducing oophorectomy(in patients who are at risk for familial breast-ovarian cancer).Considerations regarding the use of Tamoxifen have been discussed above.With risk-reducing surgical options, it is clear that the risk needs tobe quantified as well as possible in order to make an informed cost:benefit analysis.

There are some indications that patients with known geneticpredispositions to breast cancer fare worse than the majority ofpatients. Patients carrying the CHEK2 gene 1100delC variant have anestimated 2.8-fold increased risk of distant metastasis and a 3.9-foldincreased risk of disease recurrence compared to non-carriers [de Bock,et al., (2004), J Med Genet, 41, 731-5]. Patients with BRCA1node-negative tumors have a greater risk of metastasis than similarpatients who do not carry a BRCA1 mutation [Goffin, et al., (2003),Cancer, 97, 527-36; Moller, et al., (2002), Int J Cancer, 101, 555-9;Eerola, et al., (2001), Int J Cancer, 93, 368-72]. Genetic profiling cantherefore be used to help assess the risk of local recurrence andmetastatsis, thereby guiding the choice of secondary preventativetreatment. Genetic profiling based on the variants described herein maybe useful in this context. In certain embodiments, such profiling may bebased on one or more of the variants described herein. In otherembodiments, such profiling may include one or several other knowngenetic risk factors for breast cancer. Such risk factors may be wellestablished high-penetrant risk factors, or they may be one or more ofthe common, lower penetrance risk factors that have been previouslydescribed (e.g., markers on chromosome 2q14 (e.g., marker rs4848543 ormarkers in linkage disequilibrium therewith), 2q35 (e.g., markerrs13387042, or markers in linkage disequilibrium therewith), andchromosome 16 (e.g., marker rs3803662, or markers in linkagedisequilibrium therewith).

In general, patients with a primary tumor diagnosis are at risk forsecond primary tumors at a constant annual incidence of 0.7% [Peto andMack, (2000), Nat Genet, 26, 411-4]. Patients with BRCA mutations are atsignificantly greater risks for second primary tumors than most breastcancer patients, with absolute risks in the range 40-60%[Easton, (1999),Breast Cancer Res, 1, 14-7]. Carriers of BRCA mutations have a greatlyincreased risk for second primary tumors [Stacey, et al., (2006), PLoSMed, 3, e217; Metcalfe, et al., (2004), J Clin Oncol, 22, 2328-35].Patients with mutations in the CHEK2 gene have an estimated 5.7-foldincreased risk of contralateral breast cancer [de Bock, et al., (2004),J Med Genet, 41, 731-5]. Carriers of the BARD1 Cys557Ser variant are 2.7fold more likely to be diagnosed with a second primary tumor [Stacey, etal., (2006), PLoS Med, 3, e217]. Genetic risk profiling can be used toassess the risk of second primary tumors in patients and will informdecisions on how aggressive the preventative measures should be.

Methods of the Invention Diagnostic and Screening Methods

In certain embodiments, the present invention pertains to methods ofdiagnosing, or aiding in the diagnosis of, breast cancer or asusceptibility to breast cancer, by detecting particular alleles atgenetic markers that appear more frequently in breast cancer subjects orsubjects who are susceptible to breast cancer. In particularembodiments, the invention is a method of determining a susceptibilityto breast cancer by detecting at least one allele of at least onepolymorphic marker (e.g., the markers described herein). In otherembodiments, the invention relates to a method of diagnosing asusceptibility to breast cancer by detecting at least one allele of atleast one polymorphic marker. The present invention describes methodswhereby detection of particular alleles of particular markers orhaplotypes is indicative of a susceptibility to breast cancer. Suchprognostic or predictive assays can also be used to determineprophylactic treatment of a subject prior to the onset of symptomsassociated with breast cancer. The present invention pertains in someembodiments to methods of clinical applications of diagnosis, e.g.,diagnosis performed by a medical professional. In other embodiments, theinvention pertains to methods of diagnosis or determination of asusceptibility performed by a layman. The layman can be the customer ofa genotyping service. The layman may also be a genotype serviceprovider, who performs genotype analysis on a DNA sample from anindividual, in order to provide service related to genetic risk factorsfor particular traits or diseases, based on the genotype status of theindividual (i.e., the customer). Recent technological advances ingenotyping technologies, including high-throughput genotyping of SNPmarkers, such as Molecular Inversion Probe array technology (e.g.,Affymetrix GeneChip), and BeadArray Technologies (e.g., IlluminaGoldenGate and Infinium assays) have made it possible for individuals tohave their own genome assessed for up to one million SNPssimultaneously, at relatively little cost. The resulting genotypeinformation, which can be made available to the individual, can becompared to information about disease or trait risk associated withvarious SNPs, including information from public literature andscientific publications. The diagnostic application ofdisease-associated alleles as described herein, can thus for example beperformed by the individual, through analysis of his/her genotype data,by a health professional based on results of a clinical test, or by athird party, including the genotype service provider. The third partymay also be service provider who interprets genotype information fromthe customer to provide service related to specific genetic riskfactors, including the genetic markers described herein. In other words,the diagnosis or determination of a susceptibility of genetic risk canbe made by health professionals, genetic counselors, third partiesproviding genotyping service, third parties providing risk assessmentservice or by the layman (e.g., the individual), based on informationabout the genotype status of an individual and knowledge about the riskconferred by particular genetic risk factors (e.g., particular SNPs). Inthe present context, the term “diagnosing”, “diagnose a susceptibility”and “determine a susceptibility” is meant to refer to any availablediagnostic method, including those mentioned above.

In certain embodiments, a sample containing genomic DNA from anindividual is collected. Such sample can for example be a buccal swab, asaliva sample, a blood sample, or other suitable samples containinggenomic DNA, as described further herein. The genomic DNA is thenanalyzed using any common technique available to the skilled person,such as high-throughput array technologies. Results from such genotypingare stored in a convenient data storage unit, such as a data carrier,including computer databases, data storage disks, or by other convenientdata storage means. In certain embodiments, the computer database is anobject database, a relational database or a post-relational database.The genotype data is subsequently analyzed for the presence of certainvariants known to be susceptibility variants for a particular humanconditions, such as the genetic variants described herein. Genotype datacan be retrieved from the data storage unit using any convenient dataquery method. Calculating risk conferred by a particular genotype forthe individual can be based on comparing the genotype of the individualto previously determined risk (expressed as a relative risk (RR) or andodds ratio (OR), for example) for the genotype, for example for anheterozygous carrier of an at-risk variant for a particular disease ortrait (such as breast cancer). The calculated risk for the individualcan be the relative risk for a person, or for a specific genotype of aperson, compared to the average population with matched gender andethnicity. The average population risk can be expressed as a weightedaverage of the risks of different genotypes, using results from areference population, and the appropriate calculations to calculate therisk of a genotype group relative to the population can then beperformed. Alternatively, the risk for an individual is based on acomparison of particular genotypes, for example heterozygous carriers ofan at-risk allele of a marker compared with non-carriers of the at-riskallele. Using the population average may in certain embodiments be moreconvenient, since it provides a measure which is easy to interpret forthe user, i.e. a measure that gives the risk for the individual, basedon his/her genotype, compared with the average in the population. Thecalculated risk estimated can be made available to the customer via awebsite, preferably a secure website.

In certain embodiments, a service provider will include in the providedservice all of the steps of isolating genomic DNA from a sample providedby the customer, performing genotyping of the isolated DNA, calculatinggenetic risk based on the genotype data, and report the risk to thecustomer. In some other embodiments, the service provider will includein the service the interpretation of genotype data for the individual,i.e., risk estimates for particular genetic variants based on thegenotype data for the individual. In some other embodiments, the serviceprovider may include service that includes genotyping service andinterpretation of the genotype data, starting from a sample of isolatedDNA from the individual (the customer).

Overall risk for multiple risk variants can be performed using standardmethodology. For example, assuming a multiplicative model, i.e. assumingthat the risk of individual risk variants multiply to establish theoverall effect, allows for a straight-forward calculation of the overallrisk for multiple markers.

In addition, in certain other embodiments, the present inventionpertains to methods of diagnosing, or aiding in the diagnosis of, adecreased susceptibility to breast cancer, by detecting particulargenetic marker alleles or haplotypes that appear less frequently inbreast cancer patients than in individual not diagnosed with breastcancer or in the general population.

As described and exemplified herein, particular marker alleles orhaplotypes (e.g., markers on Chromosome 5p12 and 10q26, e.g. the markersand haplotypes as listed in Tables 12, 13 and 14, e.g. markersrs4415084, rs10941679 and rs1219648, and markers in linkagedisequilibrium therewith) are associated with breast cancer. In oneembodiment, the marker allele or haplotype is one that confers asignificant risk or susceptibility to breast cancer. In anotherembodiment, the invention relates to a method of diagnosing asusceptibility to breast cancer in a human individual, the methodcomprising determining the presence or absence of at least one allele ofat least one polymorphic marker in a nucleic acid sample obtained fromthe individual, wherein the at least one polymorphic marker is selectedfrom the group consisting of the polymorphic markers listed in Tables12, 13 and 14, and markers in linkage disequilibrium therewith. Inanother embodiment, the invention pertains to methods of diagnosing asusceptibility to breast cancer in a human individual, by screening forat least one marker allele or haplotype as listed in Table 12, 13 or 14,or markers in linkage disequilibrium therewith. In another embodiment,the marker allele or haplotype is more frequently present in a subjecthaving, or who is susceptible to, breast cancer (affected), as comparedto the frequency of its presence in a healthy subject (control, such aspopulation controls). In certain embodiments, the significance ofassociation of the at least one marker allele or haplotype ischaracterized by a p value <0.05. In other embodiments, the significanceof association is characterized by smaller p-values, such as <0.01,<0.001, <0.0001, <0.00001, <0.000001, <0.0000001, <0.00000001 or<0.000000001.

In these embodiments, the presence of the at least one marker allele orhaplotype is indicative of a susceptibility to breast cancer. Thesediagnostic methods involve detecting the presence or absence of at leastone marker allele or haplotype that is associated with breast cancer.The haplotypes described herein include combinations of alleles atvarious genetic markers (e.g., SNPs, microsatellites, or other geneticvariants). The detection of the particular genetic marker alleles thatmake up the particular haplotypes can be performed by a variety ofmethods described herein and/or known in the art. For example, geneticmarkers can be detected at the nucleic acid level (e.g., by directnucleotide sequencing or by other means known to the skilled in the art)or at the amino acid level if the genetic marker affects the codingsequence of a protein encoded by a breast cancer—associated nucleic acid(e.g., by protein sequencing or by immunoassays using antibodies thatrecognize such a protein). The marker alleles or haplotypes of thepresent invention correspond to fragments of a genomic DNA sequenceassociated with breast cancer. Such fragments encompass the DNA sequenceof the polymorphic marker or haplotype in question, but may also includeDNA segments in strong LD (linkage disequilibrium) with the marker orhaplotype. In one embodiment, such segments comprises segments in LDwith the marker or haplotype (as determined by a value of r² greaterthan 0.1 and/or |D′|>0.8).

In one embodiment, diagnosis of a susceptibility to breast cancer can beaccomplished using hybridization methods (see Current Protocols inMolecular Biology, Ausubel, F. et al., eds., John Wiley & Sons,including all supplements). A biological sample from a test subject orindividual (a “test sample”) of genomic DNA, RNA, or cDNA is obtainedfrom a subject suspected of having, being susceptible to, or predisposedfor breast cancer (the “test subject”). The subject can be an adult,child, or fetus. The test sample can be from any source that containsgenomic DNA, such as a blood sample, sample of amniotic fluid, sample ofcerebrospinal fluid, or tissue sample from skin, muscle, buccal orconjunctival mucosa, placenta, gastrointestinal tract or other organs. Atest sample of DNA from fetal cells or tissue can be obtained byappropriate methods, such as by amniocentesis or chorionic villussampling. The DNA, RNA, or cDNA sample is then examined. The presence ofa specific marker allele can be indicated by sequence-specifichybridization of a nucleic acid probe specific for the particularallele. The presence of more than specific marker allele or a specifichaplotype can be indicated by using several sequence-specific nucleicacid probes, each being specific for a particular allele. In oneembodiment, a haplotype can be indicated by a single nucleic acid probethat is specific for the specific haplotype (i.e., hybridizesspecifically to a DNA strand comprising the specific marker allelescharacteristic of the haplotype). A sequence-specific probe can bedirected to hybridize to genomic DNA, RNA, or cDNA. A “nucleic acidprobe”, as used herein, can be a DNA probe or an RNA probe thathybridizes to a complementary sequence. One of skill in the art wouldknow how to design such a probe so that sequence specific hybridizationwill occur only if a particular allele is present in a genomic sequencefrom a test sample. The invention can also be reduced to practice usingany convenient genotyping method, including commercially availabletechnologies and methods for genotyping particular polymorphic markers.

To diagnose a susceptibility to breast cancer, a hybridization samplecan be formed by contacting the test sample containing a breastcancer-associated nucleic acid, such as a genomic DNA sample, with atleast one nucleic acid probe. A non-limiting example of a probe fordetecting mRNA or genomic DNA is a labeled nucleic acid probe that iscapable of hybridizing to mRNA or genomic DNA sequences describedherein. The nucleic acid probe can be, for example, a full-lengthnucleic acid molecule, or a portion thereof, such as an oligonucleotideof at least 15, 30, 50, 100, 250 or 500 nucleotides in length that issufficient to specifically hybridize under stringent conditions toappropriate mRNA or genomic DNA. For example, the nucleic acid probe cancomprise all or a portion of a nucleotide sequence comprising themarkers set forth in Tables 12, 13 and 14 (SEQ ID NO:1-237), or anucleotide sequence comprising the FGF10, MRPS30, HCN1 or FGFR2 genes orfragments thereof, as described herein, optionally comprising at leastone allele of a marker described herein, or at least one haplotypedescribed herein, or the probe can be the complementary sequence of sucha sequence. In a particular embodiment, the nucleic acid probe is aportion of the nucleotide sequence of a nucleotide sequence comprisingthe markers listed in any one of Tables 12, 13 and 14 (SEQ ID NO:1-237),or a nucleotide sequence comprising the FGF10, MRPS30, HCN1 and FGFR2genes or fragments thereof, as described herein, optionally comprisingat least one allele of a marker described herein, or at least one alleleof one polymorphic marker or haplotype comprising at least onepolymorphic marker described herein, or the probe can be thecomplementary sequence of such a sequence. Other suitable probes for usein the diagnostic assays of the invention are described herein.Hybridization can be performed by methods well known to the personskilled in the art (see, e.g., Current Protocols in Molecular Biology,Ausubel, F. et al., eds., John Wiley & Sons, including all supplements).In one embodiment, hybridization refers to specific hybridization, i.e.,hybridization with no mismatches (exact hybridization). In oneembodiment, the hybridization conditions for specific hybridization arehigh stringency.

Specific hybridization, if present, is detected using standard methods.If specific hybridization occurs between the nucleic acid probe and thenucleic acid in the test sample, then the sample contains the allelethat is complementary to the nucleotide that is present in the nucleicacid probe. The process can be repeated for any markers of the presentInvention, or markers that make up a haplotype of the present invention,or multiple probes can be used concurrently to detect more than onemarker alleles at a time. It is also possible to design a single probecontaining more than one marker alleles of a particular haplotype (e.g.,a probe containing alleles complementary to 2, 3, 4, 5 or all of themarkers that make up a particular haplotype). Detection of theparticular markers of the haplotype in the sample is indicative that thesource of the sample has the particular haplotype (e.g., a haplotype)and therefore is susceptible to breast cancer.

In one preferred embodiment, a method utilizing a detectionoligonucleotide probe comprising a fluorescent moiety or group at its 3′terminus and a quencher at its 5′ terminus, and an enhanceroligonucleotide, is employed, as described by Kutyavin et al. (NucleicAcid Res. 34:e128 (2006)). The fluorescent moiety can be Gig HarborGreen or Yakima Yellow, or other suitable fluorescent moieties. Thedetection probe is designed to hybridize to a short nucleotide sequencethat includes the SNP polymorphism to be detected. Preferably, the SNPis anywhere from the terminal residue to −6 residues from the 3′ end ofthe detection probe. The enhancer is a short oligonucleotide probe whichhybridizes to the DNA template 3′ relative to the detection probe. Theprobes are designed such that a single nucleotide gap exists between thedetection probe and the enhancer nucleotide probe when both are bound tothe template. The gap creates a synthetic abasic site that is recognizedby an endonuclease, such as Endonuclease IV. The enzyme cleaves the dyeoff the fully complementary detection probe, but cannot cleave adetection probe containing a mismatch. Thus, by measuring thefluorescence of the released fluorescent moiety, assessment of thepresence of a particular allele defined by nucleotide sequence of thedetection probe can be performed.

The detection probe can be of any suitable size, although preferably theprobe is relatively short. In one embodiment, the probe is from 5-100nucleotides in length. In another embodiment, the probe is from 10-50nucleotides in length, and in another embodiment, the probe is from12-30 nucleotides in length. Other lengths of the probe are possible andwithin scope of the skill of the average person skilled in the art.

In a preferred embodiment, the DNA template containing the SNPpolymorphism is amplified by Polymerase Chain Reaction (PCR) prior todetection. In such an embodiment, the amplified DNA serves as thetemplate for the detection probe and the enhancer probe.

Certain embodiments of the detection probe, the enhancer probe, and/orthe primers used for amplification of the template by PCR include theuse of modified bases, including modified A and modified G. The use ofmodified bases can be useful for adjusting the melting temperature ofthe nucleotide molecule (probe and/or primer) to the template DNA, forexample for increasing the melting temperature in regions containing alow percentage of G or C bases, in which modified A with the capabilityof forming three hydrogen bonds to its complementary T can be used, orfor decreasing the melting temperature in regions containing a highpercentage of G or C bases, for example by using modified G bases thatform only two hydrogen bonds to their complementary C base in a doublestranded DNA molecule. In a preferred embodiment, modified bases areused in the design of the detection nucleotide probe. Any modified baseknown to the skilled person can be selected in these methods, and theselection of suitable bases is well within the scope of the skilledperson based on the teachings herein and known bases available fromcommercial sources as known to the skilled person.

Additionally, or alternatively, a peptide nucleic acid (PNA) probe canbe used in addition to, or instead of, a nucleic acid probe in thehybridization methods described herein. A PNA is a DNA mimic having apeptide-like, inorganic backbone, such as N-(2-aminoethyl)glycine units,with an organic base (A, G, C, T or U) attached to the glycine nitrogenvia a methylene carbonyl linker (see, for example, Nielsen, P., et al.,Bioconjug. Chem. 5:3-7 (1994)). The PNA probe can be designed tospecifically hybridize to a molecule in a sample suspected of containingone or more of the marker alleles or haplotypes that are associated withbreast cancer. Hybridization of the PNA probe is thus diagnostic forbreast cancer or a susceptibility to breast cancer.

In one embodiment of the invention, a test sample containing genomic DNAobtained from the subject is collected and the polymerase chain reaction(PCR) is used to amplify a fragment comprising one ore more markers orhaplotypes of the present invention. As described herein, identificationof a particular marker allele or haplotype associated with breastcancer, can be accomplished using a variety of methods (e.g., sequenceanalysis, analysis by restriction digestion, specific hybridization,single stranded conformation polymorphism assays (SSCP), electrophoreticanalysis, etc.). In another embodiment, diagnosis is accomplished byexpression analysis, for example by using quantitative PCR (kineticthermal cycling). This technique can, for example, utilize commerciallyavailable technologies, such as TaqMan® (Applied Biosystems, FosterCity, Calif.). The technique can assess the presence of an alteration inthe expression or composition of a polypeptide or splicing variant(s)that is encoded by a nucleic acid associated with breast cancer.Further, the expression of the variant(s) can be quantified asphysically or functionally different.

In another method of the invention, analysis by restriction digestioncan be used to detect a particular allele if the allele results in thecreation or elimination of a restriction site relative to a referencesequence. Restriction fragment length polymorphism (RFLP) analysis canbe conducted, e.g., as described in Current Protocols in MolecularBiology, supra. The digestion pattern of the relevant DNA fragmentindicates the presence or absence of the particular allele in thesample.

Sequence analysis can also be used to detect specific alleles orhaplotypes associated with breast cancer (e.g. the polymorphic markersof Tables 12, 13 and 14 (SEQ ID NO:1-237) and markers in linkagedisequilibrium therewith). Therefore, in one embodiment, determinationof the presence or absence of a particular marker alleles or haplotypescomprises sequence analysis of a test sample of DNA or RNA obtained froma subject or individual. PCR or other appropriate methods can be used toamplify a portion of a nucleic acid associated with breast cancer, andthe presence of a specific allele can then be detected directly bysequencing the polymorphic site (or multiple polymorphic sites in ahaplotype) of the genomic DNA in the sample.

Allele-specific oligonucleotides can also be used to detect the presenceof a particular allele in a nucleic acid associated with breast cancer(e.g. the polymorphic markers of Tables 12, 13 and 14, and markers inlinkage disequilibrium therewith), through the use of dot-blothybridization of amplified oligonucleotides with allele-specificoligonucleotide (ASO) probes (see, for example, Saiki, R. et al.,Nature, 324:163-166 (1986)). An “allele-specific oligonucleotide” (alsoreferred to herein as an “allele-specific oligonucleotide probe”) is anoligonucleotide of approximately 10-50 base pairs or approximately 15-30base pairs, that specifically hybridizes to a nucleic acid associatedwith breast cancer, and which contains a specific allele at apolymorphic site (e.g., a marker or haplotype as described herein). Anallele-specific oligonucleotide probe that is specific for one or moreparticular a nucleic acid associated with breast cancer can be preparedusing standard methods (see, e.g., Current Protocols in MolecularBiology, supra). PCR can be used to amplify the desired region. The DNAcontaining the amplified region can be dot-blotted using standardmethods (see, e.g., Current Protocols in Molecular Biology, supra), andthe blot can be contacted with the oligonucleotide probe. The presenceof specific hybridization of the probe to the amplified region can thenbe detected. Specific hybridization of an allele-specificoligonucleotide probe to DNA from the subject is indicative of aspecific allele at a polymorphic site associated with cancer, Includingbreast cancer (see, e.g., Gibbs, R. et al., Nucleic Acids Res.,17:2437-2448 (1989) and WO 93/22456).

With the addition of such analogs as locked nucleic acids (LNAs), thesize of primers and probes can be reduced to as few as 8 bases. LNAs area novel class of bicyclic DNA analogs in which the 2′ and 4′ positionsin the furanose ring are joined via an O-methylene (oxy-LNA),S-methylene (thio-LNA), or amino methylene (amino-LNA) moiety. Common toall of these LNA variants is an affinity toward complementary nucleicacids, which is by far the highest reported for a DNA analog. Forexample, particular all oxy-LNA nonamers have been shown to have meltingtemperatures (T_(m)) of 64° C. and 74° C. when in complex withcomplementary DNA or RNA, respectively, as opposed to 28° C. for bothDNA and RNA for the corresponding DNA nonamer. Substantial increases inT_(m) are also obtained when LNA monomers are used in combination withstandard DNA or RNA monomers. For primers and probes, depending on wherethe LNA monomers are included (e.g., the 3′ end, the 5′ end, or in themiddle), the T_(m) could be increased considerably.

In another embodiment, arrays of oligonucleotide probes that arecomplementary to target nucleic acid sequence segments from a subject,can be used to identify polymorphisms in a nucleic acid associated withbreast cancer (e.g. the polymorphic markers of Table 12, 13 and 14 (SEQID NO:1-237), and markers in linkage disequilibrium therewith). Forexample, an oligonucleotide array can be used. Oligonucleotide arraystypically comprise a plurality of different oligonucleotide probes thatare coupled to a surface of a substrate in different known locations.These oligonucleotide arrays, also described as “Genechips™,” have beengenerally described in the art (see, e.g., U.S. Pat. No. 5,143,854, PCTPatent Publication Nos. WO 90/15070 and 92/10092). These arrays cangenerally be produced using mechanical synthesis methods or lightdirected synthesis methods that incorporate a combination ofphotolithographic methods and solid phase oligonucleotide synthesismethods, or by other methods known to the person skilled in the art(see, e.g., Bier, F. F., et al. Adv Biochem Eng Biotechnol 109:433-53(2008); Hoheisel, J. D., Nat Rev Genet 7:200-10 (2006); Fan, J. B., etal. Methods Enzymol 410:57-73 (2006); Raqoussis, 3. & Elvidge, G.,Expert Rev Mol Diagn 6:145-52 (2006); Mockler, T. C., et al Genomics85:1-15 (2005), and references cited therein, the entire teachings ofeach of which are incorporated by reference herein). Many additionaldescriptions of the preparation and use of oligonucleotide arrays fordetection of polymorphisms can be found, for example, in U.S. Pat. No.6,858,394, U.S. Pat. No. 6,429,027, U.S. Pat. No. 5,445,934, U.S. Pat.No. 5,700,637, U.S. Pat. No. 5,744,305, U.S. Pat. No. 5,945,334, U.S.Pat. No. 6,054,270, U.S. Pat. No. 6,300,063, U.S. Pat. No. 6,733,977,U.S. Pat. No. 7,364,858, EP 619 321, and EP 373 203, the entireteachings of which are incorporated by reference herein.

Other methods of nucleic acid analysis that are available to thoseskilled in the art can be used to detect a particular allele at apolymorphic site associated with breast cancer. Representative methodsinclude, for example, direct manual sequencing (Church and Gilbert,Proc. Natl. Acad. Sci. USA, 81: 1991-1995 (1988); Sanger, F., et al.,Proc. Natl. Acad. Sci. USA, 74:5463-5467 (1977); Beavis, et al., U.S.Pat. No. 5,288,644); automated fluorescent sequencing; single-strandedconformation polymorphism assays (SSCP); clamped denaturing gelelectrophoresis (CDGE); denaturing gradient gel electrophoresis (DGGE)(Sheffield, V., et al., Proc. Natl. Acad. Sci. USA, 86:232-236 (1989)),mobility shift analysis (Orita, M., et al., Proc. Natl. Acad. Sci. USA,86:2766-2770 (1989)), restriction enzyme analysis (Flavell, R., et al.,Cell, 15:25-41 (1978); Geever, R., et al., Proc. Natl. Acad. Sci. USA,78:5081-5085 (1981)); heteroduplex analysis; chemical mismatch cleavage(CMC) (Cotton, R., et al., Proc. Natl. Acad. Sci. USA, 85:4397-4401(1985)); RNase protection assays (Myers, R., et al., Science,230:1242-1246 (1985); use of polypeptides that recognize nucleotidemismatches, such as E. coli mutS protein; and allele-specific PCR.

In another embodiment of the invention, diagnosis of breast cancer or asusceptibility to breast cancer can be made by examining expressionand/or composition of a polypeptide encoded by a nucleic acid associatedwith breast cancer in those instances where the genetic marker(s) orhaplotype(s) of the present invention result in a change in thecomposition or expression of the polypeptide. Thus, diagnosis of asusceptibility to breast cancer can be made by examining expressionand/or composition of one of these polypeptides, or another polypeptideencoded by a nucleic acid associated with breast cancer, in thoseinstances where the genetic marker or haplotype of the present inventionresults in a change in the composition or expression of the polypeptide(e.g., one or more of the FGF10, MRPS30, HCN1 and FGFR2 genes). Thehaplotypes and markers of the present invention that show association tobreast cancer may play a role through their effect on one or more ofthese nearby genes. Possible mechanisms affecting these genes include,e.g., effects on transcription, effects on RNA splicing, alterations inrelative amounts of alternative splice forms of mRNA, effects on RNAstability, effects on transport from the nucleus to cytoplasm, andeffects on the efficiency and accuracy of translation.

Thus, in another embodiment, the variants (markers or haplotypes) of theinvention showing association to breast cancer affect the expression ofa nearby gene. It is well known that regulatory element affecting geneexpression may be located tenths or even hundreds of kilobases away fromthe promoter region of a gene. By assaying for the presence or absenceof at least one allele of at least one polymorphic marker of the presentinvention, it is thus possible to assess the expression level of suchnearby genes. It is thus contemplated that the detection of the markersor haplotypes of the present invention can be used for assessingexpression for one or more of the FGF10, MRPS30, HCN1 and FGFR2 genes.

A variety of methods can be used for detecting protein expressionlevels, including enzyme linked immunosorbent assays (ELISA), Westernblots, immunoprecipitations and immunofluorescence. A test sample from asubject is assessed for the presence of an alteration in the expressionand/or an alteration in composition of the polypeptide encoded by anucleic acid associated with breast cancer. An alteration in expressionof a polypeptide encoded by a nucleic acid associated with breast cancercan be, for example, an alteration in the quantitative polypeptideexpression (i.e., the amount of polypeptide produced). An alteration inthe composition of a polypeptide encoded by a nucleic acid associatedwith breast cancer is an alteration in the qualitative polypeptideexpression (e.g., expression of a mutant polypeptide or of a differentsplicing variant). In one embodiment, diagnosis of a susceptibility tobreast cancer is made by detecting a particular splicing variant encodedby a nucleic acid associated with breast cancer, or a particular patternof splicing variants (e.g., the nucleic acids encoding the FGF10,MRPS30, and HCN1 genes).

Both such alterations (quantitative and qualitative) can also bepresent. An “alteration” in the polypeptide expression or composition,as used herein, refers to an alteration in expression or composition ina test sample, as compared to the expression or composition of thepolypeptide in a control sample. A control sample is a sample thatcorresponds to the test sample (e.g., is from the same type of cells),and is from a subject who is not affected by, and/or who does not have asusceptibility to, breast cancer. In one embodiment, the control sampleis from a subject that does not possess a marker allele or haplotype asdescribed herein. Similarly, the presence of one or more differentsplicing variants in the test sample, or the presence of significantlydifferent amounts of different splicing variants in the test sample, ascompared with the control sample, can be indicative of a susceptibilityto breast cancer. An alteration in the expression or composition of thepolypeptide in the test sample, as compared with the control sample, canbe indicative of a specific allele in the instance where the allelealters a splice site relative to the reference in the control sample.Various means of examining expression or composition of a polypeptideencoded by a nucleic acid are known to the person skilled in the art andcan be used, including spectroscopy, colorimetry, electrophoresis,isoelectric focusing, and immunoassays (e.g., David et al., U.S. Pat.No. 4,376,110) such as immunoblotting (see, e.g., Current Protocols inMolecular Biology, particularly chapter 10, supra).

For example, in one embodiment, an antibody (e.g., an antibody with adetectable label) that is capable of binding to a polypeptide encoded bya nucleic acid associated with breast cancer can be used. Antibodies canbe polyclonal or monoclonal. An intact antibody, or a fragment thereof(e.g., Fv, Fab, Fab′, F(ab′)₂) can be used. The term “labeled”, withregard to the probe or antibody, is intended to encompass directlabeling of the probe or antibody by coupling (i.e., physically linking)a detectable substance to the probe or antibody, as well as indirectlabeling of the probe or antibody by reactivity with another reagentthat is directly labeled. Examples of indirect labeling includedetection of a primary antibody using a labeled secondary antibody(e.g., a fluorescently-labeled secondary antibody) and end-labeling of aDNA probe with biotin such that it can be detected withfluorescently-labeled streptavidin.

In one embodiment of this method, the level or amount of polypeptideencoded by a nucleic acid associated with breast cancer (e.g., FGF10,MRPS30, and HCN1) in a test sample is compared with the level or amountof the polypeptide in a control sample. A level or amount of thepolypeptide in the test sample that is higher or lower than the level oramount of the polypeptide in the control sample, such that thedifference is statistically significant, is indicative of an alterationin the expression of the polypeptide encoded by the nucleic acid, and isdiagnostic for a particular allele or haplotype responsible for causingthe difference in expression. Alternatively, the composition of thepolypeptide in a test sample is compared with the composition of thepolypeptide in a control sample. In another embodiment, both the levelor amount and the composition of the polypeptide can be assessed in thetest sample and in the control sample.

In another embodiment, the diagnosis of a susceptibility to breastcancer is made by detecting at least one marker or haplotypes of thepresent invention (e.g., associated alleles of the markers listed inTables 12, 13 and 14 (SEQ ID NO:1-237), and markers in linkagedisequilibrium therewith), in combination with an additionalprotein-based, RNA-based or DNA-based assay. The methods of theinvention can also be used in combination with an analysis of asubject's family history and risk factors (e.g., environmental riskfactors, lifestyle risk factors).

Kits

Kits useful in the methods of the invention comprise components usefulin any of the methods described herein, including for example, primersfor nucleic acid amplification, hybridization probes, restrictionenzymes (e.g., for RFLP analysis), allele-specific oligonucleotides,antibodies that bind to an altered polypeptide encoded by a nucleic acidof the invention as described herein (e.g., a genomic segment comprisingat least one polymorphic marker and/or haplotype of the presentinvention) or to a non-altered (native) polypeptide encoded by a nucleicacid of the invention as described herein, means for amplification of anucleic acid associated with breast cancer, means for analyzing thenucleic acid sequence of a nucleic acid associated with breast cancer,means for analyzing the amino acid sequence of a polypeptide encoded bya nucleic acid associated with breast cancer, etc. The kits can forexample include necessary buffers, nucleic acid primers for amplifyingnucleic acids of the invention (e.g., one or more of the polymorphicmarkers as described herein), and reagents for allele-specific detectionof the fragments amplified using such primers and necessary enzymes(e.g., DNA polymerase). Additionally, kits can provide reagents forassays to be used in combination with the methods of the presentinvention, e.g., reagents for use with breast cancer diagnostic assays.

In one embodiment, the invention is a kit for assaying a sample from asubject to detect the presence of a breast cancer or a susceptibilitybreast cancer in a subject, wherein the kit comprises reagents necessaryfor selectively detecting at least one allele of at least onepolymorphism of the present invention in the genome of the individual.In a particular embodiment, the reagents comprise at least onecontiguous oligonucleotide that hybridizes to a fragment of the genomeof the individual comprising at least one polymorphism of the presentinvention. In another embodiment, the reagents comprise at least onepair of oligonucleotides that hybridize to opposite strands of a genomicsegment obtained from a subject, wherein each oligonucleotide primerpair is designed to selectively amplify a fragment of the genome of theindividual that includes at least one polymorphism, wherein thepolymorphism is selected from the group consisting of the polymorphismsas listed in Table 12, 13 and 14 (SEQ ID NO:1-237), and polymorphicmarkers in linkage disequilibrium therewith. In yet another embodimentthe fragment is at least 20 base pairs in size. Such oligonucleotides ornucleic acids (e.g., oligonucleotide primers) can be designed usingportions of the nucleic acid sequence flanking polymorphisms (e.g., SNPsor microsatellites) that are indicative of breast cancer. In anotherembodiment, the kit comprises one or more labeled nucleic acids capableof allele-specific detection of one or more specific polymorphic markersor haplotypes associated with breast cancer, and reagents for detectionof the label. Suitable labels include, e.g., a radioisotope, afluorescent label, an enzyme label, an enzyme co-factor label, amagnetic label, a spin label, an epitope label.

In particular embodiments, the polymorphic marker or haplotype to bedetected by the reagents of the kit comprises one or more markers, twoor more markers, three or more markers, four or more markers or five ormore markers selected from the group consisting of the markers in Tables12, 13 and 14. In another embodiment, the marker to be detected isselected from marker rs10941679, rs7703618, rs4415084, rs2067980,rs10035564, rs11743392, rs7716600 and rs1219648. In another embodiment,the marker or haplotype to be detected comprises at least one markerfrom the group of markers in strong linkage disequilibrium, as definedby values of r² greater than 0.2, to at least one of the group ofmarkers consisting of the markers listed in Tables 12, 13 and 14. In yetanother embodiment, the marker or haplotype to be detected comprises atleast one marker selected from the group of markers consisting ofmarkers rs10941679, rs7703618, rs4415084, rs2067980, rs10035564,rs11743392, rs7716600 and rs1219648, and markers in linkagedisequilibrium therewith.

In one preferred embodiment, the kit for detecting the markers of theinvention comprises a detection oligonucleotide probe, that hybridizesto a segment of template DNA containing a SNP polymorphisms to bedetected, an enhancer oligonucleotide probe and an endonuclease. Thedetection oligonucleotide probe comprises a fluorescent moiety or groupat its 3′ terminus and a quencher at its 5′ terminus, and an enhanceroligonucleotide is employed, as described by Kutyavin et al. (NucleicAcid Res. 34:e128 (2006)). The fluorescent moiety can be Gig HarborGreen or Yakima Yellow, or other suitable fluorescent moieties. Thedetection probe is designed to hybridize to a short nucleotide sequencethat includes the SNP polymorphism to be detected. Preferably, the SNPis anywhere from the terminal residue to −6 residues from the 3′ end ofthe detection probe. The enhancer is a short oligonucleotide probe whichhybridizes to the DNA template 3′ relative to the detection probe. Theprobes are designed such that a single nucleotide gap exists between thedetection probe and the enhancer nucleotide probe when both are bound tothe template. The gap creates a synthetic abasic site that is recognizedby an endonuclease, such as Endonuclease IV. The enzyme cleaves the dyeoff the fully complementary detection probe, but cannot cleave adetection probe containing a mismatch. Thus, by measuring thefluorescence of the released fluorescent moiety, assessment of thepresence of a particular allele defined by nucleotide sequence of thedetection probe can be performed.

The detection probe can be of any suitable size, although preferably theprobe is relatively short. In one embodiment, the probe is from 5-100nucleotides in length. In another embodiment, the probe is from 10-50nucleotides in length, and in another embodiment, the probe is from12-30 nucleotides in length. Other lengths of the probe are possible andwithin scope of the skill of the average person skilled in the art.

In a preferred embodiment, the DNA template containing the SNPpolymorphism is amplified by Polymerase Chain Reaction (PCR) prior todetection, and primers for such amplification are included in thereagent kit. In such an embodiment, the amplified DNA serves as thetemplate for the detection probe and the enhancer probe.

Certain embodiments of the detection probe, the enhancer probe, and/orthe primers used for amplification of the template by PCR include theuse of modified bases, including modified A and modified G. The use ofmodified bases can be useful for adjusting the melting temperature ofthe nucleotide molecule (probe and/or primer) to the template DNA, forexample for increasing the melting temperature in regions containing alow percentage of G or C bases, in which modified A with the capabilityof forming three hydrogen bonds to its complementary T can be used, orfor decreasing the melting temperature in regions containing a highpercentage of G or C bases, for example by using modified G bases thatform only two hydrogen bonds to their complementary C base in a doublestranded DNA molecule. In a preferred embodiment, modified bases areused in the design of the detection nucleotide probe. Any modified baseknown to the skilled person can be selected in these methods, and theselection of suitable bases is well within the scope of the skilledperson based on the teachings herein and known bases available fromcommercial sources as known to the skilled person.

In one of such embodiments, the presence of the marker or haplotype isindicative of a susceptibility (increased susceptibility or decreasedsusceptibility) to breast cancer. In another embodiment, the presence ofthe marker or haplotype is indicative of response to a breast cancertherapeutic agent. In another embodiment, the presence of the marker orhaplotype is indicative of breast cancer prognosis. In yet anotherembodiment, the presence of the marker or haplotype is indicative ofprogress of breast cancer treatment. Such treatment may includeintervention by surgery, medication or by other means (e.g., lifestylechanges).

Therapeutic Agents

Variants of the present invention (e.g., the markers and/or haplotypesof the invention, e.g., the markers listed in Tables 12, 13 and 14,e.g., rs4415084, rs10941679, rs1219648) can be used to identify noveltherapeutic targets for breast cancer. For example, genes containing, orin linkage disequilibrium with, variants (markers and/or haplotypes)associated with breast cancer (e.g., one or more of the FGF10, MRPS30,HCN1 and FGFR2 genes, or their products, as well as genes or theirproducts that are directly or indirectly regulated by or interact withthese variant genes or their products, can be targeted for thedevelopment of therapeutic agents to treat breast cancer, or prevent ordelay onset of symptoms associated with breast cancer. Therapeuticagents may comprise one or more of, for example, small non-protein andnon-nucleic acid molecules, proteins, peptides, protein fragments,nucleic acids (DNA, RNA), PNA (peptide nucleic acids), or theirderivatives or mimetics which can modulate the function and/or levels ofthe target genes or their gene products.

The nucleic acids and/or variants of the invention, nucleic acidscomprising one or more variant of the invention (e.g., nucleic acidswith sequence as set forth in any one of SEQ ID NO:1-237, or fragmentsthereof) or nucleic acids comprising their complementary sequence, maybe used as antisense constructs to control gene expression in cells,tissues or organs. The methodology associated with antisense techniquesis well known to the skilled artisan, and is described and reviewed inAntisenseDrug Technology: Principles, Strategies, and Applications,Crooke, ed., Marcel Dekker Inc., New York (2001). In general, antisensenucleic acid molecules are designed to be complementary to a region ofmRNA expressed by a gene, so that the antisense molecule hybridizes tothe mRNA, thus blocking translation of the mRNA into protein. Severalclasses of antisense oligonucleotide are known to those skilled in theart, including cleavers and blockers. The former bind to target RNAsites, activate intracellular nucleases (e.g., RnaseH or Rnase L), thatcleave the target RNA. Blockers bind to target RNA, inhibit proteintranslation by steric hindrance of the ribosomes. Examples of blockersinclude nucleic acids, morpholino compounds, locked nucleic acids andmethylphosphonates (Thompson, Drug Discovery Today, 7:912-917 (2002)).Antisense oligonucleotides are useful directly as therapeutic agents,and are also useful for determining and validating gene function, forexample by gene knock-out or gene knock-down experiments. Antisensetechnology is further described in Lavery et al., Curr. Opin. DrugDiscov. Devel. 6:561-569 (2003), Stephens et al., Curr. Opin. Mol. Ther.5:118-122 (2003), Kurreck, Eur. J. Biochem. 270:1628-44 (2003), Dias etal., Mol. Cancer Ter. 1:347-55 (2002), Chen, Methods Mol. Med.75:621-636 (2003), Wang et al., Curr. Cancer Drug Targets 1:177-96(2001), and Bennett, Antisense Nucleic Acid Drug. Dev. 12:215-24 (2002)

The variants described herein can be used for the selection and designof antisense reagents that are specific for particular variants. Usinginformation about the variants described herein, antisenseoligonucleotides or other antisense molecules that specifically targetmRNA molecules that contain one or more variants of the invention can bedesigned. In this manner, expression of mRNA molecules that contain oneor more variant of the present invention (markers and/or haplotypes) canbe inhibited or blocked. In one embodiment, the antisense molecules aredesigned to specifically bind a particular allelic form (i.e., one orseveral variants (alleles and/or haplotypes)) of the target nucleicacid, thereby inhibiting translation of a product originating from thisspecific allele or haplotype, but which do not bind other or alternatevariants at the specific polymorphic sites of the target nucleic acidmolecule.

As antisense molecules can be used to inactivate mRNA so as to inhibitgene expression, and thus protein expression, the molecules can be usedto treat a disease or disorder, such as breast cancer. The methodologycan involve cleavage by means of ribozymes containing nucleotidesequences complementary to one or more regions in the mRNA thatattenuate the ability of the mRNA to be translated. Such mRNA regionsinclude, for example, protein-coding regions, in particularprotein-coding regions corresponding to catalytic activity, substrateand/or ligand binding sites, or other functional domains of a protein.

The phenomenon of RNA interference (RNAi) has been actively studied forthe last decade, since its original discovery in C. elegans (Fire etal., Nature 391:806-11 (1998)), and in recent years its potential use intreatment of human disease has been actively pursued (reviewed in Kim &Rossi, Nature Rev. Genet. 8:173-204 (2007)). RNA interference (RNAi),also called gene silencing, is based on using double-stranded RNAmolecules (dsRNA) to turn off specific genes. In the cell, cytoplasmicdouble-stranded RNA molecules (dsRNA) are processed by cellularcomplexes into small interfering RNA (siRNA). The siRNA guide thetargeting of a protein-RNA complex to specific sites on a target mRNA,leading to cleavage of the mRNA (Thompson, Drug Discovery Today,7:912-917 (2002)). The siRNA molecules are typically about 20, 21, 22 or23 nucleotides in length. Thus, one aspect of the invention relates toisolated nucleic acid molecules, and the use of those molecules for RNAinterference, i.e. as small interfering RNA molecules (siRNA). In oneembodiment, the isolated nucleic acid molecules are 18-26 nucleotides inlength, preferably 19-25 nucleotides in length, more preferably 20-24nucleotides in length, and more preferably 21, 22 or 23 nucleotides inlength.

Another pathway for RNAi-mediated gene silencing originates inendogenously encoded primary microRNA (pri-miRNA) transcripts, which areprocessed in the cell to generate precursor miRNA (pre-miRNA). ThesemiRNA molecules are exported from the nucleus to the cytoplasm, wherethey undergo processing to generate mature miRNA molecules (miRNA),which direct translational inhibition by recognizing target sites in the3′ untranslated regions of mRNAs, and subsequent mRNA degradation byprocessing P-bodies (reviewed in Kim & Rossi, Nature Rev. Genet.8:173-204 (2007)).

Clinical applications of RNAi include the incorporation of syntheticsiRNA duplexes, which preferably are approximately 20-23 nucleotides insize, and preferably have 3′ overlaps of 2 nucleotides. Knockdown ofgene expression is established by sequence-specific design for thetarget mRNA. Several commercial sites for optimal design and synthesisof such molecules are known to those skilled in the art.

Other applications provide longer siRNA molecules (typically 25-30nucleotides in length, preferably about 27 nucleotides), as well assmall hairpin RNAs (shRNAs; typically about 29 nucleotides in length).The latter are naturally expressed, as described in Amarzguioui et al.(FEBS Lett. 579:5974-81 (2005)). Chemically synthetic siRNAs and shRNAsare substrates for in vivo processing, and in some cases provide morepotent gene-silencing than shorter designs (Kim et al., NatureBiotechnol. 23:222-226 (2005); Siolas et al., Nature Biotechnol.23:227-231 (2005)). In general siRNAs provide for transient silencing ofgene expression, because their intracellular concentration is diluted bysubsequent cell divisions. By contrast, expressed shRNAs mediatelong-term, stable knockdown of target transcripts, for as long astranscription of the shRNA takes place (Marques et al., NatureBiotechnol. 23:559-565 (2006); Brummelkamp et al., Science 296: 550-553(2002)).

Since RNAi molecules, including siRNA, miRNA and shRNA, act in asequence-dependent manner, the variants of the present invention (e.g.,the markers and haplotypes set forth in Tables 12, 13 and 14) can beused to design RNAi reagents that recognize specific nucleic acidmolecules comprising specific alleles and/or haplotypes (e.g., thealleles and/or haplotypes of the present invention), while notrecognizing nucleic acid molecules comprising other alleles orhaplotypes. These RNAi reagents can thus recognize and destroy thetarget nucleic acid molecules. As with antisense reagents, RNAi reagentscan be useful as therapeutic agents (i.e., for turning offdisease-associated genes or disease-associated gene variants), but mayalso be useful for characterizing and validating gene function (e.g., bygene knock-out or gene knock-down experiments).

Delivery of RNAi may be performed by a range of methodologies known tothose skilled in the art. Methods utilizing non-viral delivery includecholesterol, stable nucleic acid-lipid particle (SNALP), heavy-chainantibody fragment (Fab), aptamers and nanoparticles. Viral deliverymethods include use of lentivirus, adenovirus and adeno-associatedvirus. The siRNA molecules are in some embodiments chemically modifiedto increase their stability. This can include modifications at the 2′position of the ribose, including 2′-O-methylpurines and2′-fluoropyrimidines, which provide resistance to Rnase activity. Otherchemical modifications are possible and known to those skilled in theart.

The following references provide a further summary of RNAi, andpossibilities for targeting specific genes using RNAi: Kim & Rossi, Nat.Rev. Genet. 8:173-184 (2007), Chen & Rajewsky, Nat. Rev. Genet. 8:93-103 (2007), Reynolds, et al., Nat. Biotechnol. 22:326-330 (2004), Chiet al., Proc. Natl. Acad. Sci. USA 100:6343-6346 (2003), Vickers et al.,J. Biol. Chem. 278:7108-7118 (2003), Agami, Curr. Opin. Chem. Biol.6:829-834 (2002), Lavery, et al., Curr. Opin. Drug Discov. Devel.6:561-569 (2003), Shi, Trends Genet. 19:9-12 (2003), Shuey et al., DrugDiscov. Today 7:1040-46 (2002), McManus et al., Nat. Rev. Genet.3:737-747 (2002), Xia et al., Nat. Biotechnol. 20:1006-10 (2002),Plasterk et al., curr. Opin. Genet. Dev. 10:562-7 (2000), Bosher et al.,Nat. Cell Biol. 2:E31-6 (2000), and Hunter, Curr. Biol. 9:R440-442(1999).

A genetic defect leading to increased predisposition or risk fordevelopment of breast cancer, or a defect causing breast cancer, may becorrected permanently by administering to a subject carrying the defecta nucleic acid fragment that incorporates a repair sequence thatsupplies the normal/wild-type nucleotide(s) at the site of the geneticdefect. Such site-specific repair sequence may concompass an RNA/DNAoligonucleotide that operates to promote endogenous repair of asubject's genomic DNA. The administration of the repair sequence may beperformed by an appropriate vehicle, such as a complex withpolyethelenimine, encapsulated in anionic liposomes, a viral vector suchas an adenovirus vector, or other pharmaceutical compositions suitablefor promoting intracellular uptake of the adminstered nucleic acid. Thegenetic defect may then be overcome, since the chimeric oligonucleotidesinduce the incorporation of the normal sequence into the genome of thesubject, leading to expression of the normal/wild-type gene product. Thereplacement is propagated, thus rendering a permanent repair andalleviation of the symptoms associated with the disease or condition.

The present invention provides methods for identifying compounds oragents that can be used to treat breast cancer. Thus, the variants ofthe invention are useful as targets for the identification and/ordevelopment of therapeutic agents. In certain embodiments, such methodsinclude assaying the ability of an agent or compound to modulate theactivity and/or expression of a nucleic acid that includes at least oneof the variants (markers and/or haplotypes) of the present invention, orthe encoded product of the nucleic acid. This includes, for example, oneor more of the FGF10, MRPS30, HCN1 and FGFR2 genes, and their geneproducts. This in turn can be used to identify agents or compounds thatinhibit or alter the undesired activity or expression of the encodednucleic acid product. Assays for performing such experiments can beperformed in cell-based systems or in cell-free systems, as known to theskilled person. Cell-based systems include cells naturally expressingthe nucleic acid molecules of interest, or recombinant cells that havebeen genetically modified so as to express a certain desired nucleicacid molecule.

Variant gene expression in a patient can be assessed by expression of avariant-containing nucleic acid sequence (for example, a gene containingat least one variant of the present invention, which can be transcribedinto RNA containing the at least one variant, and in turn translatedinto protein), or by altered expression of a normal/wild-type nucleicacid sequence due to variants affecting the level or pattern ofexpression of the normal transcripts, for example variants in theregulatory or control region of the gene. Assays for gene expressioninclude direct nucleic acid assays (mRNA), assays for expressed proteinlevels, or assays of collateral compounds involved in a pathway, forexample a signal pathway. Furthermore, the expression of genes that areup- or down-regulated in response to the signal pathway can also beassayed.

One embodiment includes operably linking a reporter gene, such asluciferase, to the regulatory region of the gene(s) of interest.

Modulators of gene expression can in one embodiment be identified when acell is contacted with a candidate compound or agent, and the expressionof mRNA is determined. The expression level of mRNA in the presence ofthe candidate compound or agent is compared to the expression level inthe absence of the compound or agent. Based on this comparison,candidate compounds or agents for treating breast cancer can beidentified as those modulating the gene expression of the variant gene.When expression of mRNA or the encoded protein is statisticallysignificantly greater in the presence of the candidate compound or agentthan in its absence, then the candidate compound or agent is identifiedas a stimulator or up-regulator of expression of the nucleic acid. Whennucleic acid expression or protein level is statistically significantlyless in the presence of the candidate compound or agent than in itsabsence, then the candidate compound is identified as an inhibitor ordown-regulator of the nucleic acid expression.

The invention further provides methods of treatment using a compoundidentified through drug (compound and/or agent) screening as a genemodulator (i.e. stimulator and/or inhibitor of gene expression).

Methods of Assessing Probability of Response to Therapeutic Agents,Methods of Monitoring Progress of Treatment and Methods for TreatingBreast Cancer

As is known in the art, individuals can have differential responses to aparticular therapy (e.g., a therapeutic agent or therapeutic method).The basis of the differential response may be genetically determined inpart. Pharmacogenomics addresses the issue of how genetic variations(e.g., the variants (markers and/or haplotypes) of the presentinvention) affect drug response, due to altered drug disposition and/orabnormal or altered action of the drug. Thus, the basis of thedifferential response may be genetically determined in part. Clinicaloutcomes due to genetic variations affecting drug response may result intoxicity of the drug in certain individuals (e.g., carriers ornon-carriers of the genetic variants of the present invention), ortherapeutic failure of the drug. Therefore, the variants of the presentinvention may determine the manner in which a therapeutic agent and/ormethod acts on the body, or the way in which the body metabolizes thetherapeutic agent.

Accordingly, in one embodiment, the presence of a particular allele at apolymorphic site or haplotype is indicative of a different response rateto a particular treatment modality. This means that a patient diagnosedwith breast cancer, and carrying a certain allele at a polymorphic orhaplotype of the present invention (e.g., the at-risk and protectivealleles and/or haplotypes of the invention) would respond better to, orworse to, a specific therapeutic, drug and/or other therapy used totreat the disease. Therefore, the presence or absence of the markerallele or haplotype could aid in deciding what treatment should be usedfor a the patient. For example, for a newly diagnosed patient, thepresence of a marker or haplotype of the present invention may beassessed (e.g., through testing DNA derived from a blood sample, asdescribed herein). If the patient is positive for a marker allele orhaplotype (that is, at least one specific allele of the marker, orhaplotype, is present), then the physician recommends one particulartherapy, while if the patient is negative for the at least one allele ofa marker, or a haplotype, then a different course of therapy may berecommended (which may include recommending that no immediate therapy,other than serial monitoring for progression of the disease, beperformed). Thus, the patient's carrier status could be used to helpdetermine whether a particular treatment modality should beadministered. The value lies within the possibilities of being able todiagnose the disease at an early stage, to select the most appropriatetreatment, and provide information to the clinician aboutprognosis/aggressiveness of the disease in order to be able to apply themost appropriate treatment.

As described further herein, current clinical preventive options forbreast cancer are mainly chemopreventive (chemotherapy, or hormonaltherapy) and prophylactic surgery. The most common chemopreventive isTamoxifen and Raloxifene; other options include other Selective EstrogenReceptor Modulator (SERM) and aromatase inihibitors. Treatment optionsalso include radiation therapy, for which a proportion of patientsexperience adverse symptoms. The markers of the invention, as describedherein, may be used to assess response to these therapeutic options, orto predict the progress of therapy using any one of these treatmentoptions. Thus, genetic profiling can be used to select the appropriatetreatement strategy based on the genetic status of the individual, or itmay be used to predict the outcome of the particular treatment option,and thus be useful in the strategic selection of treatment options or acombination of available treatment options.

The present invention also relates to methods of monitoring progress oreffectiveness of a treatment for a breast cancer. This can be done basedon the genotype and/or haplotype status of the markers and haplotypes ofthe present invention, i.e., by assessing the absence or presence of atleast one allele of at least one polymorphic marker as disclosed herein,or by monitoring expression of genes that are associated with thevariants (markers and haplotypes) of the present invention. The riskgene mRNA or the encoded polypeptide can be measured in a tissue sample(e.g., a peripheral blood sample, or a biopsy sample). Expression levelsand/or mRNA levels can thus be determined before and during treatment tomonitor its effectiveness. Alternatively, or concomitantly, the genotypeand/or haplotype status of at least one risk variant for breast canceras presented herein is determined before and during treatment to monitorits effectiveness.

Alternatively, biological networks or metabolic pathways related to themarkers and haplotypes of the present invention can be monitored bydetermining mRNA and/or polypeptide levels. This can be done forexample, by monitoring expression levels or polypeptides for severalgenes belonging to the network and/or pathway, in samples taken beforeand during treatment. Alternatively, metabolites belonging to thebiological network or metabolic pathway can be determined before andduring treatment. Effectiveness of the treatment is determined bycomparing observed changes in expression levels/metabolite levels duringtreatment to corresponding data from healthy subjects.

In a further aspect, the markers of the present invention can be used toincrease power and effectiveness of clinical trials. Thus, individualswho are carriers of the at-risk variants of the present invention, i.e.individuals who are carriers of at least one allele of at least onepolymorphic marker conferring increased risk of developing breast cancermay be more likely to respond to a particular treatment modality. In oneembodiment, individuals who carry at-risk variants for gene(s) in apathway and/or metabolic network for which a particular treatment (e.g.,small molecule drug) is targeting, are more likely to be responders tothe treatment. In another embodiment, individuals who carry at-riskvariants for a gene, which expression and/or function is altered by theat-risk variant, are more likely to be responders to a treatmentmodality targeting that gene, its expression or its gene product.

In a further aspect, the markers and haplotypes of the present inventioncan be used for targeting the selection of pharmaceutical agents forspecific individuals. Personalized selection of treatment modalities,lifestyle changes or combination of the two, can be realized by theutilization of the at-risk variants of the present invention. Thus, theknowledge of an individual's status for particular markers of thepresent invention, can be useful for selection of treatment options thattarget genes or gene products affected by the at-risk variants of theinvention. Certain combinations of variants may be suitable for oneselection of treatment options, while other gene variant combinationsmay target other treatment options. Such combination of variant mayinclude one variant, two variants, three variants, or four or morevariants, as needed to determine with clinically reliable accuracy theselection of treatment module.

Computer-Implemented Aspects

The present invention also relates to computer-implemented applicationsof the polymorphic markers and haplotypes described herein to beassociated with breast cancer. Such applications can be useful forstoring, manipulating or otherwise analyzing genotype data that isuseful in the methods of the invention. One example pertains to storinggenotype information derived from an individual on readable media, so asto be able to provide the genotype information to a third party Thethird party may be the individual from which the genotype data isderived. The third party may also be a service provider for analyzingthe genotype information, for example a service provider who calculatesgenetic risk based on the genotype of the individual at particulargenetic markers. In one such embodiment, the service provider receivesgenotype information from a genotype service provider, and stores thegenotype information on a readable medium for subsequent analysis. Inanother embodiment, the genotype provider is also the service provider,i.e. the same party generates genotypes from a DNA sample from anindividual, stores the genotype data on a readable medium, and providersservice relating to the risk assessment or other interpretation of thegenotype data. The additional interpretation may for example includeassessment or prediction of the ancestry of the individual, or thegenealogical relationship between the individual and a referenceindividual. The reference individual may for example be a friend,relative or any other person to whom the individual wishes to comparehis/her genotypes to. In one particular embodiment, the genotype data isused to derive information about genetic risk factors contributing toincreased susceptibility to breast cancer, and report results based onsuch comparison.

In one aspect, the invention relates to computer-readable media. Ingeneral terms, such medium has capabilities of storing (i) identifierinformation for at least one polymorphic marker or a haplotye; (ii) anindicator of the frequency of at least one allele of said at least onemarker, or the frequency of a haplotype, in individuals with breastcancer; and an indicator of the frequency of at least one allele of saidat least one marker, or the frequency of a haplotype, in a referencepopulation. The reference population can be a disease-free population ofindividuals. Alternatively, the reference population is a random samplefrom the general population, and is thus representative of thepopulation at large. The frequency indicator may be a calculatedfrequency, a count of alleles and/or haplotype copies, or normalized orotherwise manipulated values of the actual frequencies that are suitablefor the particular medium.

Additional information about the individual can be stored on the medium,such as ancestry information, information about sex, physical attributesor characteristics (including height and weight), biochemicalmeasurements (such as blood pressure, blood lipid levels, etc.), orother useful information that is desirable to store or manipulate in thecontext of the genotype status of a particular individual.

The invention furthermore relates to an apparatus that is suitable fordetermination or manipulation of genetic data useful for determining asusceptibility to breast cancer in a human individual. Such an apparatuscan include a computer-readable memory, a routine for manipulating datastored on the computer-readable memory, and a routine for generating anoutput that includes a measure of the genetic data. Such measure caninclude values such as allelic or haplotype frequencies, genotypecounts, sex, age, phenotype information, values for odds ratio (OR) orrelative risk (RR), population attributable risk (PAR), or other usefulinformation that is either a direct statistic of the original genotypedata or based on calculations based on the genetic data.

The markers and haplotypes shown herein to be associated with increasedsusceptibility (e.g., increased risk) of breast cancer, are in certainembodiments useful for interpretation and/or analysis of genotype data.Thus in certain embodiments, an identification of an at-risk allele forbreast cancer, as shown herein, or an allele at a polymorphic marker inLD with any one of the markers shown herein to be associated with breastcancer, is indicative of the individual from whom the genotype dataoriginates is at increased risk of breast cancer. In one suchembodiment, genotype data is generated for at least one polymorphicmarker shown herein to be associated with breast cancer, or a marker inlinkage disequilibrium therewith. The genotype data is subsequently madeavailable to the individual from whom the data originates, for examplevia a user interface accessable over the internet, together with aninterpretation of the genotype data, e.g., in the form of a risk measure(such as an absolute risk (AR), risk ratio (RR) or odds ration (OR)) forthe disease (e.g., breast cancer). In another embodiment, at-riskmarkers identified in a genotype dataset derived from an individual areassessed and results from the assessment of the risk conferred by thepresence of such at-risk variants in the dataset are made available tothe individual, for example via a secure web interface, or by othercommunication means. The results of such risk assessment can be reportedin numeric form (e.g., by risk values, such as absolute risk, relativerisk, and/or an odds ratio, or by a percentage increase in risk comparedwith a reference), by graphical means, or by other means suitable toillustrate the risk to the individual from whom the genotype data isderived. In particular embodiments, the results of risk assessment ismade available to a third party, e.g., a physician, other healthcareworker or genetic counselor.

Markers Useful in Various Aspects of the Invention

The above-described applications can all be practiced with the markersand haplotypes of the invention that have in more detail been describedwith respect to methods of assessing susceptibility to breast cancer.Thus, these applications can in general be reduced to practice usingmarkers within the Chr5p12 and Chr10q26 genomic regions as definedherein, including markers as listed in Tables 12, 13 and 14, and markersin linkage disequilibrium therewith. In one embodiment, a marker usefulin the various aspects and embodiments of the invention is selected fromthe markers set forth in Tables 12, 13 and 14 (SEQ ID NO:1-237). In oneembodiment, the marker is selected from marker rs10941679, rs7703618,rs4415084, rs2067980, rs10035564, rs11743392, rs7716600, and rs1219648,and markers in linkage disequilibrium therewith. In another embodiment,the marker is selected from marker rs10941679, rs7703618, rs4415084,rs2067980, rs10035564, rs11743392, rs7716600 and rs1219648. In anotherembodiment, the marker is selected from rs10941679, and markers inlinkage disequilibrium therewith. In one embodiment, the marker isselected from the markers set forth in Table 13. In another embodiment,the marker is selected from marker rs4415084, and markers in linkagedisequilibrium therewith. In another embodiment, the marker is selectedfrom the markers set forth in Table 12. In another embodiment, themarker is selected from marker rs1219648, and markers in linkagedisequilibrium therewith. In another embodiment, the marker is selectedfrom the markers set forth in Table 14. In another embodiment, themarker is rs4415084. In another embodiment, the marker is rs10941679. Inanother embodiment, the marker is rs1219648. In another embodiment, themarker is rs4415084 or rs10941679. In another embodiment, marker allelesconferring increased risk or susceptibility of breast cancer areselected from rs10941679 allele G, rs7703618 allele T, rs4415084 alleleG, rs2067980 allele G, rs10035564 allele G, rs11743392 allele T,rs7716600 allele A, and rs1219648 allele G.

Nucleic Acids and Polypeptides

The nucleic acids and polypeptides described herein can be used inmethods and kits of the present invention, as described in the above. An“isolated” nucleic acid molecule, as used herein, is one that isseparated from nucleic acids that normally flank the gene or nucleotidesequence (as in genomic sequences) and/or has been completely orpartially purified from other transcribed sequences (e.g., as in an RNAlibrary). For example, an isolated nucleic acid of the invention can besubstantially isolated with respect to the complex cellular milieu inwhich it naturally occurs, or culture medium when produced byrecombinant techniques, or chemical precursors or other chemicals whenchemically synthesized. In some instances, the isolated material willform part of a composition (for example, a crude extract containingother substances), buffer system or reagent mix. In other circumstances,the material can be purified to essential homogeneity, for example asdetermined by polyacrylamide gel electrophoresis (PAGE) or columnchromatography (e.g., HPLC). An isolated nucleic acid molecule of theinvention can comprise at least about 50%, at least about 80% or atleast about 90% (on a molar basis) of all macromolecular speciespresent. With regard to genomic DNA, the term “isolated” also can referto nucleic acid molecules that are separated from the chromosome withwhich the genomic DNA is naturally associated. For example, the isolatednucleic acid molecule can contain less than about 250 kb, 200 kb, 150kb, 100 kb, 75 kb, 50 kb, 25 kb, 10 kb, 5 kb, 4 kb, 3 kb, 2 kb, 1 kb,0.5 kb or 0.1 kb of the nucleotides that flank the nucleic acid moleculein the genomic DNA of the cell from which the nucleic acid molecule isderived.

The nucleic acid molecule can be fused to other coding or regulatorysequences and still be considered isolated. Thus, recombinant DNAcontained in a vector is included in the definition of “isolated” asused herein. Also, isolated nucleic acid molecules include recombinantDNA molecules in heterologous host cells or heterologous organisms, aswell as partially or substantially purified DNA molecules in solution.“Isolated” nucleic acid molecules also encompass in vivo and in vitroRNA transcripts of the DNA molecules of the present invention. Anisolated nucleic acid molecule or nucleotide sequence can include anucleic acid molecule or nucleotide sequence that is synthesizedchemically or by recombinant means. Such isolated nucleotide sequencesare useful, for example, in the manufacture of the encoded polypeptide,as probes for isolating homologous sequences (e.g., from other mammalianspecies), for gene mapping (e.g., by in situ hybridization withchromosomes), or for detecting expression of the gene in tissue (e.g.,human tissue), such as by Northern blot analysis or other hybridizationtechniques.

The invention also pertains to nucleic acid molecules that hybridizeunder high stringency hybridization conditions, such as for selectivehybridization, to a nucleotide sequence described herein (e.g., nucleicacid molecules that specifically hybridize to a nucleotide sequencecontaining a polymorphic site associated with a marker or haplotypedescribed herein). Such nucleic acid molecules can be detected and/orisolated by allele- or sequence-specific hybridization (e.g., under highstringency conditions). Stringency conditions and methods for nucleicacid hybridizations are well known to the skilled person (see, e.g.,Current Protocols in Molecular Biology, Ausubel, F. et al, John Wiley &Sons, (1998), and Kraus, M. and Aaronson, S., Methods Enzymol.,200:546-556 (1991), the entire teachings of which are incorporated byreference herein.

The percent identity of two nucleotide or amino acid sequences can bedetermined by aligning the sequences for optimal comparison purposes(e.g., gaps can be introduced in the sequence of a first sequence). Thenucleotides or amino acids at corresponding positions are then compared,and the percent identity between the two sequences is a function of thenumber of identical positions shared by the sequences (i.e., %identity=# of identical positions/total # of positions×100). In certainembodiments, the length of a sequence aligned for comparison purposes isat least 30%, at least 40%, at least 50%, at least 60%, at least 70%, atleast 80%, at least 90%, or at least 95%, of the length of the referencesequence. The actual comparison of the two sequences can be accomplishedby well-known methods, for example, using a mathematical algorithm. Anon-limiting example of such a mathematical algorithm is described inKarlin, S. and Altschul, S., Proc. Natl. Acad. Sci. USA, 90:5873-5877(1993). Such an algorithm is incorporated into the NBLAST and XBLASTprograms (version 2.0), as described in Altschul, S. et al., NucleicAcids Res., 25:3389-3402 (1997). When utilizing BLAST and Gapped BLASTprograms, the default parameters of the respective programs (e.g.,NBLAST) can be used. See the website on the world wide web atncbi.nlm.nih.gov. In one embodiment, parameters for sequence comparisoncan be set at score=100, wordlength=12, or can be varied (e.g., W=5 orW=20).

Other examples include the algorithm of Myers and Miller, CABIOS (1989),ADVANCE and ADAM as described in Torellis, A. and Robotti, C., Comput.Appl. Biosci. 10:3-5 (1994); and FASTA described in Pearson, W. andLipman, D., Proc. Natl. Acad. Sci. USA, 85:2444-48 (1988).

In another embodiment, the percent identity between two amino acidsequences can be accomplished using the GAP program in the GCG softwarepackage (Accelrys, Cambridge, UK).

The present invention also provides isolated nucleic acid molecules thatcontain a fragment or portion that hybridizes under highly stringentconditions to a nucleic acid that comprises, or consists of, anucleotide sequence comprising the polymorphic markers listed in Table12, Table 13 and Table 14 (SEQ ID NO:1-237), and the nucleotide sequenceof the FGF10, MRPS30, HCN1 and FGFR2 genes; or a nucleotide sequencecomprising, or consisting of, the complement of the nucleotide sequenceof a nucleotide sequence comprising the polymorphic markers listed inTable 12, Table 13 and Table 14 (SEQ ID NO:1-237), and the nucleotidesequence of the FGF10, MRPS30, HCN1 and FGFR2 genes, wherein thenucleotide sequence comprises at least one polymorphic allele containedin the markers and haplotypes described herein. The nucleic acidfragments of the invention are at least about 15, at least about 18, 20,23 or 25 nucleotides, and can be 30, 40, 50, 100, 200, 500, 1000, 10,000or more nucleotides in length.

The nucleic acid fragments of the invention are used as probes orprimers in assays such as those described herein. “Probes” or “primers”are oligonucleotides that hybridize in a base-specific manner to acomplementary strand of a nucleic acid molecule. In addition to DNA andRNA, such probes and primers include polypeptide nucleic acids (PNA), asdescribed in Nielsen, P. et al., Science 254:1497-1500 (1991). A probeor primer comprises a region of nucleotide sequence that hybridizes toat least about 15, typically about 20-25, and in certain embodimentsabout 40, 50 or 75, consecutive nucleotides of a nucleic acid molecule.In one embodiment, the probe or primer comprises at least one allele ofat least one polymorphic marker or at least one haplotype describedherein, or the complement thereof. In particular embodiments, a probe orprimer can comprise 100 or fewer nucleotides; for example, in certainembodiments from 6 to 50 nucleotides, or, for example, from 12 to 30nucleotides. In other embodiments, the probe or primer is at least 70%identical, at least 80% identical, at least 85% identical, at least 90%identical, or at least 95% identical, to the contiguous nucleotidesequence or to the complement of the contiguous nucleotide sequence. Inanother embodiment, the probe or primer is capable of selectivelyhybridizing to the contiguous nucleotide sequence or to the complementof the contiguous nucleotide sequence. Often, the probe or primerfurther comprises a label, e.g., a radioisotope, a fluorescent label, anenzyme label, an enzyme co-factor label, a magnetic label, a spin label,an epitope label.

The nucleic acid molecules of the invention, such as those describedabove, can be identified and isolated using standard molecular biologytechniques well known to the skilled person. The amplified DNA can belabeled (e.g., radiolabeled) and used as a probe for screening a cDNAlibrary derived from human cells. The cDNA can be derived from mRNA andcontained in a suitable vector. Corresponding clones can be isolated,DNA can obtained following in vivo excision, and the cloned insert canbe sequenced in either or both orientations by art-recognized methods toidentify the correct reading frame encoding a polypeptide of theappropriate molecular weight. Using these or similar methods, thepolypeptide and the DNA encoding the polypeptide can be isolated,sequenced and further characterized.

In general, the isolated nucleic acid sequences of the invention can beused as molecular weight markers on Southern gels, and as chromosomemarkers that are labeled to map related gene positions. The nucleic acidsequences can also be used to compare with endogenous DNA sequences inpatients to identify breast cancer or a susceptibility to breast cancer,and as probes, such as to hybridize and discover related DNA sequencesor to subtract out known sequences from a sample (e.g., subtractivehybridization). The nucleic acid sequences can further be used to deriveprimers for genetic fingerprinting, to raise anti-polypeptide antibodiesusing immunization techniques, and/or as an antigen to raise anti-DNAantibodies or elicit immune responses.

Antibodies

Polyclonal antibodies and/or monoclonal antibodies that specificallybind one form of the gene product but not to the other form of the geneproduct are also provided. Antibodies are also provided which bind aportion of either the variant or the reference gene product thatcontains the polymorphic site or sites. The term “antibody” as usedherein refers to immunoglobulin molecules and immunologically activeportions of immunoglobulin molecules, i.e., molecules that containantigen-binding sites that specifically bind an antigen. A molecule thatspecifically binds to a polypeptide of the invention is a molecule thatbinds to that polypeptide or a fragment thereof, but does notsubstantially bind other molecules in a sample, e.g., a biologicalsample, which naturally contains the polypeptide. Examples ofimmunologically active portions of immunoglobulin molecules includeF(ab) and F(ab′)₂ fragments which can be generated by treating theantibody with an enzyme such as pepsin. The invention providespolyclonal and monoclonal antibodies that bind to a polypeptide of theinvention. The term “monoclonal antibody” or “monoclonal antibodycomposition”, as used herein, refers to a population of antibodymolecules that contain only one species of an antigen binding sitecapable of immunoreacting with a particular epitope of a polypeptide ofthe invention. A monoclonal antibody composition thus typically displaysa single binding affinity for a particular polypeptide of the inventionwith which it immunoreacts.

Polyclonal antibodies can be prepared as described above by immunizing asuitable subject with a desired immunogen, e.g., polypeptide of theinvention or a fragment thereof. The antibody titer in the immunizedsubject can be monitored over time by standard techniques, such as withan enzyme linked immunosorbent assay (ELISA) using immobilizedpolypeptide. If desired, the antibody molecules directed against thepolypeptide can be isolated from the mammal (e.g., from the blood) andfurther purified by well-known techniques, such as protein Achromatography to obtain the IgG fraction. At an appropriate time afterimmunization, e.g., when the antibody titers are highest,antibody-producing cells can be obtained from the subject and used toprepare monoclonal antibodies by standard techniques, such as thehybridoma technique originally described by Kohler and Milstein, Nature256:495-497 (1975), the human B cell hybridoma technique (Kozbor et al.,Immunol. Today 4: 72 (1983)), the EBV-hybridoma technique (Cole et al.,Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, Inc., pp.77-96) or trioma techniques. The technology for producing hybridomas iswell known (see generally Current Protocols in Immunology (1994) Coliganet al., (eds.) John Wiley & Sons, Inc., New York, N.Y.). Briefly, animmortal cell line (typically a myeloma) is fused to lymphocytes(typically splenocytes) from a mammal immunized with an immunogen asdescribed above, and the culture supernatants of the resulting hybridomacells are screened to identify a hybridoma producing a monoclonalantibody that binds a polypeptide of the invention.

Any of the many well known protocols used for fusing lymphocytes andimmortalized cell lines can be applied for the purpose of generating amonoclonal antibody to a polypeptide of the invention (see, e.g.,Current Protocols in Immunology, supra; Galfre et al., Nature 266:55052(1977); R. H. Kenneth, in Monoclonal Antibodies: A New Dimension InBiological Analyses, Plenum Publishing Corp., New York, N.Y. (1980); andLerner, Yale J. Biol. Med. 54:387-402 (1981)). Moreover, the ordinarilyskilled worker will appreciate that there are many variations of suchmethods that also would be useful.

Alternative to preparing monoclonal antibody-secreting hybridomas, amonoclonal antibody to a polypeptide of the invention can be identifiedand isolated by screening a recombinant combinatorial immunoglobulinlibrary (e.g., an antibody phage display library) with the polypeptideto thereby isolate immunoglobulin library members that bind thepolypeptide. Kits for generating and screening phage display librariesare commercially available (e.g., the Pharmacia Recombinant PhageAntibody System, Catalog No. 27-9400-01; and the Stratagene SurfZAP™Phage Display Kit, Catalog No. 240612). Additionally, examples ofmethods and reagents particularly amenable for use in generating andscreening antibody display library can be found in, for example, U.S.Pat. No. 5,223,409; PCT Publication No. WO 92/18619; PCT Publication No.WO 91/17271; PCT Publication No. WO 92/20791; PCT Publication No. WO92/15679; PCT Publication No. WO 93/01288; PCT Publication No. WO92/01047; PCT Publication No. WO 92/09690; PCT Publication No. WO90/02809; Fuchs et al., Bio/Technology 9: 1370-1372 (1991); Hay et al.,Hum. Antibod. Hybridomas 3:81-85 (1992); Huse et al., Science 246:1275-1281 (1989); and Griffiths et al., EMBO J. 12:725-734 (1993).

Additionally, recombinant antibodies, such as chimeric and humanizedmonoclonal antibodies, comprising both human and non-human portions,which can be made using standard recombinant DNA techniques, are withinthe scope of the invention. Such chimeric and humanized monoclonalantibodies can be produced by recombinant DNA techniques known in theart.

In general, antibodies of the invention (e.g., a monoclonal antibody)can be used to isolate a polypeptide of the invention by standardtechniques, such as affinity chromatography or immunoprecipitation. Apolypeptide-specific antibody can facilitate the purification of naturalpolypeptide from cells and of recombinantly produced polypeptideexpressed in host cells. Moreover, an antibody specific for apolypeptide of the invention can be used to detect the polypeptide(e.g., in a cellular lysate, cell supernatant, or tissue sample) inorder to evaluate the abundance and pattern of expression of thepolypeptide. Antibodies can be used diagnostically to monitor proteinlevels in tissue as part of a clinical testing procedure, e.g., to, forexample, determine the efficacy of a given treatment regimen. Theantibody can be coupled to a detectable substance to facilitate itsdetection. Examples of detectable substances include various enzymes,prosthetic groups, fluorescent materials, luminescent materials,bioluminescent materials, and radioactive materials. Examples ofsuitable enzymes include horseradish peroxidase, alkaline phosphatase,beta-galactosidase, or acetylcholinesterase; examples of suitableprosthetic group complexes include streptavidin/biotin andavidin/biotin; examples of suitable fluorescent materials includeumbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine,dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; anexample of a luminescent material includes luminol; examples ofbioluminescent materials include luciferase, luciferin, and aequorin,and examples of suitable radioactive material include ¹²⁵I, ¹³¹I, ³⁵S or³H.

Antibodies may also be useful in pharmacogenomic analysis. In suchembodiments, antibodies against variant proteins encoded by nucleicacids according to the invention, such as variant proteins that areencoded by nucleic acids that contain at least one polymorpic marker ofthe invention, can be used to identify individuals that require modifiedtreatment modalities.

Antibodies can furthermore be useful for assessing expression of variantproteins in disease states, such as in active stages of a disease, or inan individual with a predisposition to a disease related to the functionof the protein, in particular breast cancer. Antibodies specific for avariant protein of the present invention that is encoded by a nucleicacid that comprises at least one polymorphic marker or haplotype asdescribed herein can be used to screen for the presence of the variantprotein, for example to screen for a predisposition to breast cancer asindicated by the presence of the variant protein.

Antibodies can be used in other methods. Thus, antibodies are useful asdiagnostic tools for evaluating proteins, such as variant proteins ofthe invention, in conjunction with analysis by electrophoretic mobility,isoelectric point, tryptic or other protease digest, or for use in otherphysical assays known to those skilled in the art. Antibodies may alsobe used in tissue typing. In one such embodiment, a specific variantprotein has been correlated with expression in a specific tissue type,and antibodies specific for the variant protein can then be used toidentify the specific tissue type.

Subcellular localization of proteins, including variant proteins, canalso be determined using antibodies, and can be applied to assessaberrant subcellular localization of the protein in cells in varioustissues. Such use can be applied in genetic testing, but also inmonitoring a particular treatment modality. In the case where treatmentis aimed at correcting the expression level or presence of the variantprotein or aberrant tissue distribution or developmental expression ofthe variant protein, antibodies specific for the variant protein orfragments thereof can be used to monitor therapeutic efficacy.

Antibodies are further useful for inhibiting variant protein function,for example by blocking the binding of a variant protein to a bindingmolecule or partner. Such uses can also be applied in a therapeuticcontext in which treatment involves inhibiting a variant protein'sfunction. An antibody can be for example be used to block orcompetitively inhibit binding, thereby modulating (i.e., agonizing orantagonizing) the activity of the protein. Antibodies can be preparedagainst specific protein fragments containing sites required forspecific function or against an intact protein that is associated with acell or cell membrane. For administration in vivo, an antibody may belinked with an additional therapeutic payload, such as radionuclide, anenzyme, an immunogenic epitope, or a cytotoxic agent, includingbacterial toxins (diphtheria or plant toxins, such as ricin). The invivo half-life of an antibody or a fragment thereof may be increased bypegylation through conjugation to polyethylene glycol.

The present invention further relates to kits for using antibodies inthe methods described herein. This includes, but is not limited to, kitsfor detecting the presence of a variant protein in a test sample. Onepreferred embodiment comprises antibodies such as a labelled orlabelable antibody and a compound or agent for detecting variantproteins in a biological sample, means for determining the amount or thepresence and/or absence of variant protein in the sample, and means forcomparing the amount of variant protein in the sample with a standard,as well as instructions for use of the kit.

The present invention will now be exemplified by the followingnon-limiting examples.

Example 1 Identification of Variants on Chromosome 5p12 that Associatewith Risk of Breast Cancer Introduction

Mutations in breast cancer susceptibility genes BRCA1 and BRCA2 accountfor 15-25% of the familial component of breast cancer risk [Easton,(1999), Breast Cancer Res, 1., 14-7; Balmain, et al., (2003), Nat Genet,33 Suppl, 238-44]. Much of the genetic component of risk of breastcancer remains uncharacterized and is thought to arise from combinationsof less penetrant variants that, individually, may be quite common[Pharoah, et al., (2002), Nat Genet, 31, 33-6]. Many searches for lesspenetrant breast cancer risk variants have been carried out using acandidate gene, case-control association approach. Findings from thesestudies have often proven difficult to replicate [Breast CancerAssociation, (2006), J Natl Cancer Inst, 98, 1382-96]. Recently commonmissense variants in two genes, CASP8 and TGFB1, have been shown to beassociated with breast cancer risk by using well-powered, multi-centeranalyses [Cox, et al., (2007), Nat Genet, 39, 352-8]. These reportsemphasize the importance of large scale studies with adequatereplication when the goal is to identify common variants conferringmodest increases in the risk of breast cancer.

Results

Numerous Illumina SNPs in a Region on Chromosome 5p12 are Associatedwith an Increased Risk for Breast Cancer in Iceland

In order to search widely for alleles of common SNPs associating tobreast cancer susceptibility, we carried out a genome-wide SNPassociation study using Illumina HumanHap300 microarray technology.Genotyping was carried out on approximately 1,600 Icelandic breastcancer patients and 11,563 controls. This discovery sample set wasdesignated “Iceland 1”. After removing SNPs that failed quality controlchecks, 311,524 SNPs remained and were tested for association withbreast cancer. The results were adjusted for relatedness amongindividuals and potential population stratification using the method ofgenomic control [Devlin and Roeder, (1999), Biometrics, 55,997-1004](see Methods). Signals were ranked by P-value. A set of SNPsfrom the same area on chromosome 5p12 occupied 39 of the top 50 ranks.The highest ranks occupied by SNPs located in 5p12 were ranks 5 through9. The region of interest containing these SNPs extended fromapproximately chromosome 5 co-ordinate 44,094,392 bp (position of markerrs7704166; all co-ordinates herein are from NCBI Build 34) to theposition of the last Illumina SNP before the centromere; namelyrs10941803 at 46,393,984 bp. Results from genotyping of the IlluminaSNPs in this region are presented in Table 1 and are presentedgraphically in FIG. 1.

In order to further investigate the signals related to one of the highlyranked SNPs, marker rs7703618 on chr5p12, we generated and validatedCentaurus assays for this SNP. The Centaurus assay was designatedSG05S3065.c1. This SNP assay was used to genotype an independent sampleof approximately 591 Icelandic Breast Cancer patients and 1314 controls.This independent sample was designated Iceland 2. As shown in Table 2,the SNP showed a significant association with breast cancer in theIceland 2 sample, confirming the original observations with Iceland 1.We have thus replicated the original finding observed in the Iceland 1sample in the independent Iceland 2 sample. The joint P-value forIceland 1 & Iceland 2 approached a level that would be consideredgenome-wide significant after applying the conservative Bonferronicorrection for the 311,524 SNPs tested [Skol, et al., (2006), Nat Genet,38, 209-13].

Association to Chromosome 5p12, Confirmed in CGEMS Data, is Genome-WideSignificant Following Joint Analysis with CGEMS

The Cancer Genetics Markers of Susceptibility (CGEMS) project of theU.S. National Cancer Institute has released data to the public domain ona genome-wide SNP association study for breast cancer susceptibilitybased on 1145 patients and 1142 controls genotyped with approximately530,000 SNPs using the Illumina platform. These data are available at:https://caintegrator.nci.nih.gov/cgems/. The CGEMS project found nogenome-wide significant signals in the 5p12 region. However, we notedthat one SNP, namely rs4415084, had a P-value of 1.38E-07 when data fromIceland (Iceland 1 cohort) and the CGEMS data set was analyzed jointly,that is genome-wide significant after Bonferroni correction. This SNPhad an unremarkable P-value of 2.21E-03 in the CGEMS data set, thegenome wide significance of the joint value being mostly carried by theIceland 1 P-value of 9.02E-06. Thus, the CGEMS data, while nowheresignificant on its own, provides confirmation of our originalobservation of association to chromosome 5p12.

Numerous HapMap (Non Illumina) SNP Markers could Show BC RiskAssociations Through their Correlation with the Illumina SNPs thatShowed an Association in the 5p12 Region

We contemplated that there may be allelic heterogeneity at this locus,that is there may be more than one underlying at-risk variant present inthe 5p12 region that is correlated to different degrees with theIllumina SNP set tested and exist at different frequencies in differentpopulations. This is based on the observation that there appear to besignificant signals distal to the cluster of most significant markers(see FIG. 1). Furthermore, we have noted that in some cases associationsignals are very strong in the Iceland 1 material but are not strong inthe CGEMS data: for example the rs4415084 SNP described above gave avery strong signal in Iceland 1 but not in the CGEMS data. Similarly,the SNP that was tested successfully for replication in Iceland,rs7703618, gave a P-value of 6.93E-06 and only 2.37E-02 in the CGEMSdata set. Given that there may be alleleic heterogeneity in the 5p12region, we define a set of HapMap SNPs that, through their correlationswith signals that we observed in the Iceland 1 data set, could be usedto detect all pathogenic mutations in the 5p12 region. In order toidentify such a set of HapMap SNPs, we first identified a class of SNPsthat gave P-values of 10E-3 or less in the Iceland 1 data set. We thensplit this set Into equivalence classes, membership of a particularequivalence class being defined as two SNPs that have an r² valueof >0.8 between them. This resulted in a set of 6 equivalence classeswhich we designated A to F. For each equivalence class we started withthe SNP that gave the most significant signal in the class (which wedenoted the “key” SNP), then by reference to HapMap data we identifiedall HapMap SNPs that were correlated with the key SNP by an r² value of0.2 or greater and were not themselves represented on the IlluminaHap300 chip. Thus, using the Iceland 1 data we observed signals inseveral different equivalence classes and that fact, in itself, providesevidence for allelic heterogeneity. These HapMap SNPs could, throughtheir correlations with SNPs in one or more of the equivalence classeswe identified, also be used to detect the same breast cancer riskassociations we originally observed. The list of the key SNPs and theircorrelations with HapMap SNPs is shown in Table 4.

FGF10 and MRP530 are the Most Likely Candidate Genes in the 5p12 Region

FIG. 1 shows a plot of the association signals obtained in the 5p12region superimposed on a map of recombination hotspots, chromosomebands, known genes, and recombination rates. Recombination hotspots andrecombination rates were determined as described by McVean et al.[McVean, et al., (2004), Science, 304, 581-4]. A representation of r²values between HapMap SNPs in the region is also shown. It can be seenthat there are three known genes of note in the region; FGF10, MRPS30,and HCN1, along with one poorly characterized gene LOC441070. Two ofthese, FGF10 and MRPS30 are compelling candidates for an involvement inbreast cancer predisposition.

As reviewed by Howard and Ashworth [Howard and Ashworth, (2006), PLoSGenet, 2, e112], FGF10 is required for normal embryonic development ofthe breast. FGF10 has been implicated as an oncogene in mouse models ofbreast cancer by MMTV insertional mutagenesis and FGF10 is overexpressed in around 10% of human breast cancers [Theodorou, et al.,(2004), Oncogene, 23, 6047-55]. As can be seen in FIG. 1, the FGF10 geneis separated from the main clusters of association signals by arecombination hotspot. However key elements controlling regulation ofFGF10 may be present in the region where the strong association signalsoccur. Alternatively, the association signals may be in linkagedisequilibrium with pathogenic mutations within the FGF10 gene itself.

MRPS30 encodes the mitochondrial 28S ribosomal subunit. It is also knownas programmed cell death protein 9 (PDCD9). This is the mammaliancounterpart of the Gallus gallus pro-apoptotic protein p52. It has beenshown to induce apoptosis and activate the stress-responsive JNK1pathway in mammalian cells. The protein appears to function in apoptosisat least in part through the Bcl-2 pathway [Sun, et al., (1998), Gene,208, 157-66; Carim, et al., (1999), Cytogenet Cell Genet, 87, 85-8;Cavdar Koc, et al., (2001), FEBS Lett, 492, 166-70]. Although it has notbeen implicated previously in breast cancer, its involvement in theabove pathways suggest that genetic variants in MRPS30 may be involvedin modifying breast cancer risk.

Methods Patient and Control Selection

Approval for this study was granted by the National Bioethics Committeeof Iceland and the Icelandic Data Protection Authority. Records ofbreast cancer diagnoses were obtained from the Icelandic Cancer Registry(ICR). The ICR contains all cases of invasive breast tumors and ductalor lobular carcinoma in-situ diagnosed in Iceland from Jan. 1, 1955. Allpeople living in Iceland who had a diagnosis entered into the ICR up tothe end of December 2005 were eligible to participate in the study. TheICR contained records of 4603 individuals diagnosed during this period.A prevalence cohort comprised of all living patients (approximately2840) were eligible for recruitment into the study. We obtained informedconsent, a blood sample, and diagnostic information from 2210 patients,a participation rate of approximately 78%. Genotyping was successful ona total of 2190 patients for rs7703618. Further details of therecruitment of this patient group have been reported previously [Stacey,et al., (2006), PLoS Med, 3, e217].

The 12,904 Icelandic controls consisted of 846 individuals randomlyselected from the Icelandic Genealogical Database and 12,058 individualsfrom other ongoing genome-wide association studies at deCODE.Individuals with a diagnosis of breast cancer in the ICR were excluded.Both male and female genders were included.

Illumina Genotyping

DNA samples were genotyped according to the manufacturer's instructionson Illumina Infinium HumanHap300 SNP bead microarrays (Illumina, SanDiego, Calif., USA), containing 317,503 SNPs derived from Phase I of theInternational HapMap project. This chip provides about 75% genomiccoverage in the Utah CEPH (CEU) HapMap samples for common SNPs at r²≧0.8[Barrett and Cardon, (2006), Nat Genet, 38, 659-62]. Of the total numberof SNPs on the chip, 5979 were deemed unsuitable either because theywere monomorphic (i.e. the minor allele frequency in the combinedpatients and control set was less than 0.001), or had low (<95%) yieldor showed a very significant distortion from Hardy-Weinberg equilibriumin the controls (P<1×10⁻¹⁰). All of these problematic SNPs were removedfrom the analysis. Thus 311,524 SNPs were used in the associationanalysis. Any chips with an overall call rate below 98% of the SNPs werealso excluded from the genome-wide association analysis.

Centaurus SNP Genotyping

A Centaurus assay [Kutyavin, et al., (2006), Nucleic Acids Res, 34,e128] was designed for rs7703618 and validated by genotyping the HapMapCEU sample and comparing the genotypes with published data. The assaysgave <1.5% mis-matches with HapMap data. Table 5 shows the sequencecontext for the key SNPs discussed herein. Table 6 shows the descriptionof the Centaurus Assay for marker rs7703618 that was developed forgenotyping in this study.

Statistical Methods

We calculated the odds ratio (OR) of a SNP allele assuming themultiplicative model, i.e. assuming that the relative risk of the twoalleles that a person carries multiplies. Allelic frequencies ratherthan carrier frequencies are presented for the markers. The associatedP-values were calculated with a standard likelihood ratio Chi-squaredstatistic as implemented in the NEMO software package [Gretarsdottir, etal., (2003), Nat Genet, 35, 131-8]. Confidence intervals were calculatedassuming that the estimate of the OR has a log-normal distribution.

Some Icelandic patients and controls are related, both within andbetween groups, causing the Chi-squared test statistic to have a meangreater than one and a median larger than 0.675². We estimated theinflation factor for Iceland 1 using a method of genomic control [Devlinand Roeder, (1999), Biometrics, 55, 997-1004] by calculating the averageof the observed Chi-squared statistics for the genome-wide SNP set,which accounts for relatedness and for potential populationstratification. For Iceland 2, which was not typed with a genome-wideset of markers, the inflation factor was estimated by simulatinggenotypes through the Icelandic genealogy [Grant, et al., (2006), NatGenet, 38, 320-3]. The estimated inflation factors were 1.105 forIceland 1 and 1.11 for Iceland 2. The estimated inflation factor for thejoint analyses of the Iceland 1 and Iceland 2 sample sets was 1.08,obtained by simulation.

All P-values are reported as two-sided.

TABLE 1 Association results for Illumina SNPs in the 5p12 region:Position SNP Allele bld34 P-value OR Cases Frq. Cases Controls Frq.Controls rs7704166 A 44094392 1.15E−01 1.063 1660 0.492 11561 0.477rs6879107 A 44096806 6.07E−01 1.029 1659 0.856 11554 0.852 rs4334895 G44105438 4.12E−01 1.037 1658 0.273 11555 0.266 rs6859263 G 441115949.65E−01 1.002 1628 0.170 10803 0.169 rs4242104 T 44122873 6.22E−011.028 1660 0.858 11563 0.854 rs4502833 C 44144266 6.53E−01 1.020 16250.304 11379 0.300 rs6871975 T 44166026 5.11E−01 1.029 1618 0.295 112630.289 rs4242107 A 44174584 7.26E−01 1.015 1660 0.278 11562 0.275rs4242108 T 44174787 6.53E−01 1.025 1660 0.858 11563 0.854 rs4492117 C44174878 3.54E−01 1.055 1660 0.136 11563 0.130 rs4596388 G 441942863.91E−01 1.053 1660 0.121 11562 0.116 rs4866869 G 44195892 7.44E−011.014 1660 0.309 11555 0.306 rs4296809 A 44232169 6.18E−01 1.028 16600.152 11552 0.149 rs4866880 A 44237263 8.77E−01 1.007 1660 0.302 115620.301 rs4866773 A 44264014 7.86E−01 1.015 1659 0.850 11537 0.848rs1550939 G 44267213 7.20E−01 1.015 1660 0.299 11563 0.296 rs4643965 C44270309 8.24E−01 1.010 1660 0.259 11562 0.257 rs10512836 T 442739266.14E−01 1.025 1660 0.794 11563 0.790 rs726941 C 44279279 9.02E−02 1.0781660 0.271 11563 0.257 rs2053784 C 44286984 6.31E−01 1.021 1660 0.28711562 0.283 rs7713769 G 44314707 3.23E−01 1.062 1660 0.884 11561 0.878rs1011814 G 44381321 2.23E−01 1.051 1660 0.660 11563 0.649 rs11743802 T44396655 2.16E−02 1.134 1656 0.857 11533 0.840 rs2121875 T 444110462.14E−01 1.052 1660 0.660 11563 0.649 rs1384449 A 44422561 7.95E−011.012 1649 0.763 11202 0.761 rs2973644 C 44429684 9.54E−01 1.003 16600.232 11562 0.232 rs10512852 C 44439070 1.82E−01 1.101 1660 0.923 115620.915 rs723166 C 44441516 8.00E−02 1.081 1658 0.740 11537 0.725rs16901843 T 44448618 4.81E−01 1.036 1660 0.819 11561 0.814 rs4866898 A44449132 1.00E+00 1.000 1660 0.146 11553 0.146 rs13357659 G 444783861.53E−01 1.059 1660 0.406 11562 0.392 rs1351637 G 44487204 8.87E−011.007 1660 0.197 11563 0.196 rs922853 G 44497553 9.46E−01 1.004 16600.108 11562 0.108 rs1120718 T 44512079 6.10E−02 1.089 1660 0.761 115630.746 rs1384450 C 44525645 3.55E−01 1.037 1660 0.585 11562 0.576rs2062140 T 44541916 1.84E−01 1.112 1660 0.066 11563 0.060 rs17320222 A44544792 1.76E−01 1.142 1657 0.960 11552 0.955 rs6866555 C 445840143.26E−01 1.052 1659 0.176 11553 0.169 rs4463187 G 44614156 1.43E−021.100 1660 0.532 11561 0.508 rs7708449 A 44614727 5.00E−02 1.084 16600.355 11561 0.337 rs6889804 T 44618310 3.31E−01 1.051 1660 0.176 115610.169 rs4642379 T 44620489 3.27E−01 1.052 1660 0.176 11562 0.169rs6896299 G 44655907 3.37E−01 1.051 1658 0.176 11552 0.168 rs4529201 C44659472 1.09E−02 1.105 1659 0.531 11539 0.506 rs4415084 T 447080169.02E−06 1.194 1660 0.415 11562 0.373 rs2218080 G 44759831 3.37E−051.181 1660 0.405 11562 0.366 rs11747159 T 44783211 7.62E−06 1.197 16580.397 11551 0.354 rs2330572 C 44786490 2.38E−05 1.184 1660 0.405 115610.365 rs994793 G 44788748 2.19E−05 1.185 1658 0.405 11560 0.365rs6885754 A 44811554 9.68E−01 1.007 1660 0.014 11550 0.013 rs7712949 T44815846 1.19E−05 1.193 1659 0.392 11547 0.350 rs11746980 A 448233791.98E−05 1.186 1660 0.405 11561 0.365 rs16901964 T 44828756 1.97E−051.188 1659 0.390 11560 0.350 rs727305 C 44841543 1.51E−05 1.191 16600.390 11516 0.349 rs10462081 A 44846166 2.13E−05 1.187 1657 0.390 115570.350 rs13183209 A 44849250 1.92E−05 1.188 1660 0.390 11555 0.350rs13159598 G 44851427 3.31E−05 1.182 1644 0.402 11513 0.363 rs3761648 G44853580 8.57E−06 1.202 1589 0.387 10844 0.344 rs13174122 C 448562413.63E−05 1.181 1658 0.390 11498 0.351 rs11746506 T 44858067 2.12E−051.187 1660 0.389 11561 0.350 rs12188871 A 44859505 1.50E−05 1.191 16570.390 11532 0.349 rs9637783 G 44865147 1.74E−05 1.190 1655 0.389 115200.349 rs4457089 T 44867237 2.06E−05 1.187 1660 0.389 11560 0.349rs6867533 T 44872793 6.23E−06 1.200 1641 0.398 11355 0.355 rs6896350 C44878072 2.09E−05 1.187 1660 0.389 11559 0.350 rs1371025 C 448797342.06E−05 1.187 1660 0.389 11557 0.349 rs6451775 G 44882289 2.21E−051.187 1660 0.389 11561 0.350 rs729599 C 44887761 2.21E−05 1.187 16600.389 11561 0.350 rs987394 T 44891879 2.16E−05 1.187 1660 0.389 115580.349 rs4440370 A 44898853 2.17E−05 1.187 1659 0.389 11559 0.350rs4492119 A 44901115 7.18E−06 1.200 1645 0.387 11340 0.345 rs7703497 A44902529 2.12E−05 1.187 1659 0.389 11559 0.349 rs4395640 T 449146018.63E−06 1.197 1652 0.395 11497 0.353 rs7716600 A 44920506 3.12E−051.214 1660 0.241 11560 0.208 rs4412123 T 44921789 1.10E−05 1.192 16600.409 11560 0.367 rs7705343 G 44925078 1.31E−05 1.190 1658 0.409 115580.368 rs4129642 G 44943630 1.11E−05 1.194 1655 0.396 11518 0.355rs9790879 C 44945386 1.27E−05 1.191 1659 0.409 11561 0.368 rs10462084 G44946850 2.31E−01 1.067 1659 0.155 11562 0.147 rs9791056 T 449493927.66E−06 1.197 1660 0.396 11562 0.354 rs6880275 T 44954436 7.76E−061.198 1648 0.395 11484 0.353 rs6870136 G 44956163 7.51E−06 1.197 16600.396 11559 0.354 rs6881563 C 44958354 7.28E−06 1.198 1660 0.396 115610.354 rs7703618 G 44960080 6.93E−06 1.198 1659 0.396 11557 0.354rs10077814 C 44962290 1.24E−05 1.191 1659 0.407 11561 0.366 rs6451783 G44963794 9.41E−06 1.195 1660 0.395 11563 0.353 rs4298259 G 449662129.35E−06 1.195 1660 0.395 11562 0.353 rs7728431 T 44968180 9.96E−061.195 1660 0.394 11559 0.353 rs12517690 A 44984794 9.56E−06 1.195 16600.395 11562 0.353 rs3935213 A 45006945 3.89E−02 1.112 1659 0.182 115610.167 rs6866995 C 45022348 3.77E−02 1.112 1660 0.182 11563 0.167rs2067980 G 45027818 9.89E−04 1.200 1660 0.155 11555 0.132 rs3923826 C45118278 2.49E−02 1.137 1660 0.870 11562 0.855 rs11743309 A 451678892.15E−01 1.067 1660 0.836 11559 0.826 rs12654948 T 45211216 1.28E−011.094 1651 0.878 11260 0.868 rs12515820 G 45238970 4.07E−02 1.126 16600.135 11558 0.122 rs12515179 C 45292596 4.86E−02 1.120 1660 0.138 115620.125 rs10512876 G 45295105 7.73E−01 1.025 1660 0.055 11561 0.054rs10035564 G 45298001 1.79E−04 1.178 1650 0.291 11477 0.258 rs13180087 C45311269 5.28E−02 1.127 1659 0.117 11563 0.105 rs4866929 A 453120909.34E−02 1.068 1660 0.484 11560 0.468 rs981782 T 45331219 2.28E−04 1.1591601 0.458 10706 0.421 rs981782 T 45331219 1.05E−01 1.066 1660 0.47011562 0.454 rs9790873 C 45337015 3.61E−02 1.133 1660 0.128 11554 0.115rs9292918 G 45346536 4.21E−03 1.161 1656 0.176 11546 0.155 rs6895055 A45366909 2.34E−03 1.172 1660 0.177 11551 0.155 rs6888352 A 453714162.72E−03 1.169 1659 0.177 11551 0.155 rs994092 G 45381260 2.49E−03 1.1711659 0.177 11557 0.155 rs10473384 A 45389311 2.34E−03 1.172 1659 0.17711560 0.155 rs1501357 G 45410376 2.61E−03 1.170 1660 0.177 11563 0.155rs12517615 C 45412289 6.70E−02 1.190 1647 0.047 11286 0.040 rs1501362 T45423708 2.85E−03 1.167 1660 0.179 11558 0.157 rs6451798 T 454333553.11E−03 1.166 1660 0.179 11560 0.157 rs6414906 C 45451822 9.05E−031.112 1658 0.375 11560 0.351 rs13162651 C 45455401 5.54E−02 1.081 16550.375 11363 0.357 rs1483310 G 45459859 7.27E−02 1.178 1655 0.051 114130.044 rs12659024 T 45463191 9.87E−01 1.001 1658 0.080 11549 0.080rs6892290 G 45472638 1.08E−02 1.109 1660 0.375 11560 0.351 rs6451801 A45484934 1.02E−02 1.110 1660 0.375 11560 0.351 rs13354798 C 455025781.16E−02 1.108 1660 0.377 11563 0.353 rs12651887 T 45515924 1.19E−011.076 1657 0.225 11549 0.212 rs1471683 A 45524926 7.07E−02 1.088 16600.232 11562 0.217 rs2337414 A 45614170 1.12E−01 1.100 1660 0.884 115620.874 rs1852598 G 45648477 2.93E−01 1.049 1660 0.245 11562 0.237rs11743392 T 45660476 4.43E−04 1.150 1614 0.490 10826 0.455 rs1534391 T45714539 6.74E−02 1.117 1652 0.884 11396 0.873 rs2879074 C 457617101.49E−02 1.103 1660 0.384 11560 0.361 rs4380674 C 45814898 2.50E−011.053 1660 0.259 11563 0.249 rs7447717 A 45815753 1.29E−02 1.105 16600.384 11559 0.361 rs4388219 A 45826251 1.41E−01 1.063 1660 0.328 115550.315 rs10941703 C 45827373 1.02E−01 1.069 1660 0.364 11561 0.349rs10941704 G 45829007 2.99E−01 1.049 1657 0.245 11531 0.236 rs4551074 T45838254 2.15E−01 1.057 1660 0.260 11561 0.249 rs13155321 G 458420479.36E−02 1.071 1660 0.367 11559 0.351 rs7733616 C 45844224 1.59E−011.063 1630 0.290 11372 0.278 rs10069793 G 45848652 6.09E−02 1.079 16500.367 11465 0.349 rs13176359 T 45859776 2.72E−01 1.050 1660 0.267 115520.258 rs4866973 A 45862187 2.03E−01 1.059 1657 0.259 11530 0.248rs4242126 A 45871874 2.80E−01 1.051 1660 0.245 11561 0.236 rs6865429 G45876009 9.87E−02 1.069 1659 0.366 11561 0.351 rs10461763 G 458847085.07E−01 1.075 1659 0.034 11557 0.032 rs7730617 A 45885804 1.53E−011.063 1659 0.294 11557 0.281 rs13175559 C 45900128 1.10E−01 1.067 16600.368 11560 0.353 rs11951003 A 45902630 1.09E−01 1.067 1659 0.368 115590.353 rs4331911 G 45904876 1.55E−01 1.060 1627 0.360 11297 0.347rs13340341 T 45906437 1.53E−01 1.060 1660 0.363 11556 0.349 rs12109205 G45922179 2.53E−01 1.051 1660 0.288 11559 0.278 rs4368738 T 459382891.67E−01 1.061 1659 0.294 11561 0.281 rs9637799 G 45948109 5.03E−011.075 1660 0.034 11560 0.032 rs6862657 A 45951498 1.03E−01 1.069 16560.366 11533 0.351 rs7443189 A 45958468 6.09E−01 1.057 1654 0.034 115190.032 rs11948152 C 46000128 1.89E−01 1.054 1644 0.580 11377 0.567rs7443384 G 46006394 1.01E−01 1.070 1660 0.345 11562 0.330 rs7701444 C46014363 1.05E−01 1.068 1660 0.369 11561 0.354 rs13352566 T 460424032.03E−01 1.059 1648 0.260 11420 0.249 rs4370277 G 46094169 2.40E−011.048 1636 0.402 11316 0.390 rs12173206 A 46121458 5.02E−01 1.076 16590.034 11551 0.032 rs10066479 G 46142988 1.85E−01 1.054 1660 0.401 115500.388 rs12515804 C 46144107 1.69E−01 1.056 1660 0.401 11561 0.388rs13175755 T 46145281 2.58E−01 1.083 1660 0.086 11563 0.080 rs7720482 T46151664 1.66E−01 1.057 1659 0.402 11559 0.389 rs4975890 C 461591392.75E−01 1.047 1656 0.330 11552 0.320 rs12690679 C 46206288 3.19E−011.040 1659 0.423 11556 0.414 rs12697527 T 46247605 2.98E−01 1.042 16580.419 11533 0.409 rs13168297 T 46273834 5.02E−01 1.028 1649 0.348 114380.342 rs4975957 A 46310803 2.97E−01 1.042 1660 0.420 11549 0.410rs12659648 C 46330355 4.96E−01 1.028 1652 0.349 11541 0.343 rs13355128 T46332613 3.32E−01 1.041 1614 0.649 10822 0.639 rs10941803 C 463939843.56E−01 1.039 1613 0.343 11166 0.335 Shown are the SNP names, theidentity of the risk allele, the location (in NCBI Build 34 coordinates,the P−value and Odds Ratio (OR) for Breast Cancer association, thenumbers of individuals tested and the allele frequencies in the BreastCancer Case and the Control groups respectively.

TABLE 2 Replication of signal from SNP rs7703618 in an independentIcelandic Breast Cancer Case/Control sample: rs7703618 (G) FrequencyCohort (Cases/Controls) Cases Controls OR P Iceland 1 (1599/11558) 0.3960.354 1.20 1.1E−05 Iceland 2 (591/1314) 0.392 0.353 1.18 2.9E−02 Icelandcombined (2190/12872) 0.395 0.354 1.19 5.3E−07

TABLE 3 HapMap SNPs with r² values >0.2 in relation to key SNPs inequivalence classes A-F. SNP 1 SNP 2 D′ R2 p-min SNP A = rs4415084rs4415084 rs7735881 1.000 1.000 3.35E−36 rs4415084 rs7723539 1.000 1.0003.35E−36 rs4415084 rs4492118 1.000 1.000 5.45E−36 rs4415084 rs44631881.000 1.000 1.87E−35 rs4415084 rs920329 1.000 1.000 7.12E−36 rs4415084rs7720551 1.000 1.000 3.35E−36 rs4415084 rs714130 1.000 1.000 3.35E−36rs4415084 rs6874055 1.000 1.000 3.37E−36 rs4415084 rs6861560 1.000 1.0003.35E−36 rs4415084 rs6451770 1.000 1.000 3.35E−36 rs4415084 rs45714801.000 1.000 1.43E−35 rs4415084 rs4419600 1.000 1.000 3.35E−36 rs4415084rs4415085 1.000 1.000 3.35E−36 rs4415084 rs2218081 1.000 1.000 3.35E−36rs4415084 rs2165010 1.000 1.000 3.37E−36 rs4415084 rs2165009 1.000 1.0005.45E−36 rs4415084 rs2013513 1.000 1.000 1.43E−35 rs4415084 rs18219361.000 1.000 5.45E−36 rs4415084 rs1438825 1.000 1.000 3.35E−36 rs4415084rs13156930 1.000 1.000 3.35E−36 rs4415084 rs12522626 1.000 1.0008.86E−36 rs4415084 rs12515012 1.000 1.000 3.35E−36 rs4415084 rs121871961.000 1.000 3.35E−36 rs4415084 rs10941678 1.000 1.000 5.45E−36 rs4415084rs4321755 1.000 1.000 3.35E−36 rs4415084 rs10941677 1.000 1.000 1.43E−35rs4415084 rs10805685 1.000 1.000 5.45E−36 rs4415084 rs16901937 1.0000.965 4.27E−34 rs4415084 rs920328 1.000 0.931 3.70E−32 rs4415084rs7380559 0.923 0.766 1.13E−23 rs4415084 rs4518409 0.923 0.766 1.13E−23rs4415084 rs1438821 0.923 0.766 1.13E−23 rs4415084 rs1438820 0.923 0.7661.13E−23 rs4415084 rs13362132 0.923 0.766 1.13E−23 rs4415084 rs131602590.923 0.766 1.13E−23 rs4415084 rs11958808 0.923 0.766 1.13E−23 rs4415084rs1061310 0.923 0.766 1.13E−23 rs4415084 rs10512865 0.923 0.766 1.13E−23rs4415084 rs1048758 0.923 0.766 1.13E−23 rs4415084 rs10044096 0.9230.766 1.13E−23 rs4415084 rs9292913 0.922 0.765 1.72E−23 rs4415084rs13177711 0.922 0.765 1.72E−23 rs4415084 rs11949847 0.921 0.7642.62E−23 rs4415084 rs4329028 0.923 0.763 2.84E−23 rs4415084 rs77165710.923 0.763 2.33E−23 rs4415084 rs7711697 0.922 0.762 1.14E−22 rs4415084rs7380878 0.921 0.762 4.30E−23 rs4415084 rs10043344 0.921 0.761 6.56E−23rs4415084 rs10040082 0.957 0.738 2.38E−22 rs4415084 rs7717459 0.9190.706 1.84E−21 rs4415084 rs6893319 0.919 0.706 1.84E−21 rs4415084rs6872254 0.919 0.706 1.84E−21 rs4415084 rs6451778 0.919 0.706 1.84E−21rs4415084 rs4373287 0.919 0.706 1.84E−21 rs4415084 rs1866406 0.919 0.7061.84E−21 rs4415084 rs1438822 0.919 0.706 1.84E−21 rs4415084 rs14388190.919 0.706 1.84E−21 rs4415084 rs13189120 0.919 0.706 1.84E−21 rs4415084rs13155698 0.919 0.706 1.84E−21 rs4415084 rs13154781 0.919 0.7061.84E−21 rs4415084 rs10462080 0.919 0.706 1.84E−21 rs4415084 rs100656380.919 0.706 1.84E−21 rs4415084 rs10059086 0.919 0.706 1.84E−21 rs4415084rs10057521 0.919 0.706 1.84E−21 rs4415084 rs10053247 0.919 0.7061.84E−21 rs4415084 rs10041518 0.919 0.706 1.84E−21 rs4415084 rs100404880.919 0.706 1.84E−21 rs4415084 rs10039866 0.919 0.706 1.84E−21 rs4415084rs12513749 0.919 0.706 3.69E−21 rs4415084 rs6875933 0.919 0.706 3.67E−21rs4415084 rs7736092 0.918 0.705 2.73E−21 rs4415084 rs10070037 0.9180.705 2.73E−21 rs4415084 rs6871052 0.918 0.705 5.47E−21 rs4415084rs7708506 0.917 0.704 4.05E−21 rs4415084 rs4642377 0.920 0.704 1.85E−21rs4415084 rs4457088 0.919 0.703 2.72E−21 rs4415084 rs10038554 0.9190.703 2.72E−21 rs4415084 rs3747479 0.919 0.701 3.77E−21 rs4415084rs6894324 0.918 0.701 6.83E−21 rs4415084 rs9790896 0.880 0.700 8.38E−21rs4415084 rs6875287 0.915 0.699 4.58E−20 rs4415084 rs6868232 0.918 0.6971.16E−20 rs4415084 rs11741772 0.916 0.697 2.05E−20 rs4415084 rs92929140.904 0.694 6.03E−18 rs4415084 rs11951760 0.912 0.694 6.63E−20 rs4415084rs12518851 0.908 0.685 2.88E−18 rs4415084 rs7715731 0.915 0.685 1.60E−19rs4415084 rs1438827 0.881 0.675 2.82E−20 rs4415084 rs12651949 0.9110.665 4.20E−16 rs4415084 rs11948186 0.914 0.649 1.74E−19 rs4415084rs10051592 0.914 0.649 1.74E−19 rs4415084 rs16902086 0.802 0.5592.08E−16 rs4415084 rs3935086 0.905 0.537 5.09E−16 rs4415084 rs105128750.901 0.517 6.18E−15 rs4415084 rs10941679 1.000 0.513 5.36E−17 rs4415084rs4613718 1.000 0.454 3.93E−17 rs4415084 rs930395 1.000 0.402 1.56E−13rs4415084 rs10044408 1.000 0.330 6.94E−11 rs4415084 rs6869488 0.8560.287 6.68E−09 rs4415084 rs4460145 0.856 0.287 6.68E−09 rs4415084rs7709262 0.847 0.275 5.02E−08 rs4415084 rs6874127 0.847 0.273 8.18E−08rs4415084 rs13183434 1.000 0.266 2.69E−09 rs4415084 rs7716101 0.8430.264 9.59E−08 rs4415084 rs7709661 0.843 0.264 9.59E−08 rs4415084rs6894974 0.843 0.264 9.59E−08 rs4415084 rs6885307 0.843 0.264 9.59E−08rs4415084 rs4533894 0.843 0.264 9.59E−08 rs4415084 rs12521639 0.8430.264 9.59E−08 rs4415084 rs12054976 0.843 0.264 9.59E−08 rs4415084rs7731099 0.839 0.264 1.12E−07 rs4415084 rs7701679 0.841 0.263 1.15E−07rs4415084 rs6862655 0.739 0.262 3.65E−08 rs4415084 rs10059745 0.7390.262 3.65E−08 rs4415084 rs6451796 0.840 0.255 1.80E−07 rs4415084rs3923055 0.840 0.255 1.80E−07 rs4415084 rs1501361 0.840 0.255 1.80E−07rs4415084 rs1392973 0.840 0.255 1.80E−07 rs4415084 rs6866354 0.734 0.2545.71E−08 rs4415084 rs4639238 0.736 0.254 5.70E−08 rs4415084 rs123745070.736 0.254 5.70E−08 rs4415084 rs10066953 0.736 0.254 5.70E−08 rs4415084rs4371761 0.839 0.252 2.37E−07 rs4415084 rs10054521 0.733 0.249 8.38E−08rs4415084 rs4502832 0.838 0.246 1.74E−07 rs4415084 rs4485937 0.838 0.2461.74E−07 rs4415084 rs4389695 0.838 0.246 1.74E−07 rs4415084 rs42968100.838 0.246 1.74E−07 rs4415084 rs10941692 0.838 0.246 1.74E−07 rs4415084rs12522398 0.832 0.243 2.89E−07 rs4415084 rs4866900 0.923 0.232 4.31E−08rs4415084 rs4493682 0.816 0.212 3.37E−06 rs4415084 rs4308490 0.811 0.2104.58E−06 rs4415084 rs6893494 0.814 0.206 3.96E−06 rs4415084 rs125231570.814 0.206 3.96E−06 rs4415084 rs11954598 0.814 0.206 3.96E−06 rs4415084rs7720104 0.809 0.205 5.34E−06 rs4415084 rs6864149 0.810 0.204 4.47E−06rs4415084 rs983940 1.000 0.204 4.35E−09 rs4415084 rs6451767 1.000 0.2044.35E−09 rs4415084 rs1482663 1.000 0.204 4.35E−09 rs4415084 rs13516331.000 0.204 4.35E−09 rs4415084 rs10079222 1.000 0.204 4.35E−09 rs4415084rs12514414 0.804 0.202 7.18E−06 rs4415084 rs6876773 0.802 0.201 0.00001rs4415084 rs7711446 0.802 0.201 0.000012 SNP B = rs7703618 rs7703618rs9292914 1.000 1.000 1.09E−30 rs7703618 rs7717459 1.000 1.000 2.19E−35rs7703618 rs7715731 1.000 1.000 3.83E−33 rs7703618 rs7736092 1.000 1.0003.33E−35 rs7703618 rs7708506 1.000 1.000 3.33E−35 rs7703618 rs68752871.000 1.000 2.28E−34 rs7703618 rs10041518 1.000 1.000 2.19E−35 rs7703618rs10039866 1.000 1.000 2.19E−35 rs7703618 rs10038554 1.000 1.0009.82E−35 rs7703618 rs6894324 1.000 1.000 9.82E−35 rs7703618 rs68933191.000 1.000 2.19E−35 rs7703618 rs6875933 1.000 1.000 2.19E−35 rs7703618rs6872254 1.000 1.000 2.19E−35 rs7703618 rs6871052 1.000 1.000 3.33E−35rs7703618 rs6868232 1.000 1.000 1.90E−34 rs7703618 rs6451778 1.000 1.0002.19E−35 rs7703618 rs4642377 1.000 1.000 6.45E−35 rs7703618 rs44570881.000 1.000 9.82E−35 rs7703618 rs4373287 1.000 1.000 2.19E−35 rs7703618rs3747479 1.000 1.000 1.44E−34 rs7703618 rs1866406 1.000 1.000 2.19E−35rs7703618 rs1438822 1.000 1.000 2.19E−35 rs7703618 rs1438819 1.000 1.0002.19E−35 rs7703618 rs13189120 1.000 1.000 2.19E−35 rs7703618 rs131556981.000 1.000 2.19E−35 rs7703618 rs13154781 1.000 1.000 2.19E−35 rs7703618rs12651949 1.000 1.000 1.89E−31 rs7703618 rs12518851 1.000 1.0005.31E−33 rs7703618 rs12513749 1.000 1.000 2.19E−35 rs7703618 rs119517601.000 1.000 1.18E−33 rs7703618 rs11741772 1.000 1.000 3.34E−34 rs7703618rs10462080 1.000 1.000 2.19E−35 rs7703618 rs10070037 1.000 1.0003.33E−35 rs7703618 rs10065638 1.000 1.000 2.19E−35 rs7703618 rs100590861.000 1.000 2.19E−35 rs7703618 rs10057521 1.000 1.000 2.19E−35 rs7703618rs10053247 1.000 1.000 2.19E−35 rs7703618 rs10040488 1.000 1.0002.19E−35 rs7703618 rs10040082 1.000 1.000 1.50E−34 rs7703618 rs14388271.000 0.964 2.55E−33 rs7703618 rs7711697 1.000 0.929 2.12E−31 rs7703618rs11958808 1.000 0.929 7.53E−32 rs7703618 rs10044096 1.000 0.9297.53E−32 rs7703618 rs7380559 1.000 0.929 7.53E−32 rs7703618 rs45184091.000 0.929 7.53E−32 rs7703618 rs4329028 1.000 0.929 2.12E−31 rs7703618rs1438821 1.000 0.929 7.53E−32 rs7703618 rs1438820 1.000 0.929 7.53E−32rs7703618 rs13362132 1.000 0.929 7.53E−32 rs7703618 rs13160259 1.0000.929 7.53E−32 rs7703618 rs1061310 1.000 0.929 7.53E−32 rs7703618rs10512865 1.000 0.929 7.53E−32 rs7703618 rs1048758 1.000 0.929 7.53E−32rs7703618 rs7716571 1.000 0.929 4.57E−31 rs7703618 rs9292913 1.000 0.9281.15E−31 rs7703618 rs7380878 1.000 0.928 3.22E−31 rs7703618 rs131777111.000 0.928 1.15E−31 rs7703618 rs10043344 1.000 0.928 4.91E−31 rs7703618rs11949847 1.000 0.928 1.75E−31 rs7703618 rs9790896 0.962 0.858 7.84E−27rs7703618 rs920328 0.923 0.764 1.31E−23 rs7703618 rs4571480 0.922 0.7312.33E−22 rs7703618 rs11948186 0.883 0.723 1.25E−21 rs7703618 rs100515920.883 0.723 1.25E−21 rs7703618 rs7735881 0.921 0.708 7.34E−22 rs7703618rs7723539 0.921 0.708 7.34E−22 rs7703618 rs714130 0.921 0.708 7.34E−22rs7703618 rs6861560 0.921 0.708 7.34E−22 rs7703618 rs6451770 0.921 0.7087.34E−22 rs7703618 rs4419600 0.921 0.708 7.34E−22 rs7703618 rs44150850.921 0.708 7.34E−22 rs7703618 rs4321755 0.921 0.708 7.34E−22 rs7703618rs2218081 0.921 0.708 7.34E−22 rs7703618 rs1438825 0.921 0.708 7.34E−22rs7703618 rs13156930 0.921 0.708 7.34E−22 rs7703618 rs12515012 0.9210.708 7.34E−22 rs7703618 rs12187196 0.921 0.708 7.34E−22 rs7703618rs7720551 0.920 0.707 1.46E−21 rs7703618 rs6874055 0.919 0.706 1.84E−21rs7703618 rs2165010 0.919 0.706 1.84E−21 rs7703618 rs920329 0.920 0.7062.91E−21 rs7703618 rs10805685 0.921 0.706 1.09E−21 rs7703618 rs18219360.921 0.705 2.18E−21 rs7703618 rs10941678 0.921 0.705 2.18E−21 rs7703618rs4463188 0.918 0.704 3.64E−21 rs7703618 rs2013513 0.919 0.704 2.72E−21rs7703618 rs10941677 0.919 0.704 2.72E−21 rs7703618 rs4492118 0.9210.703 1.62E−21 rs7703618 rs2165009 0.921 0.703 1.62E−21 rs7703618rs12522626 0.921 0.703 1.62E−21 rs7703618 rs16901937 0.920 0.6824.51E−21 rs7703618 rs16902086 0.812 0.635 1.69E−18 rs7703618 rs39350860.865 0.570 1.75E−16 rs7703618 rs10512875 0.865 0.570 1.75E−16 rs7703618rs930395 1.000 0.482 9.85E−16 rs7703618 rs10941679 0.842 0.435 2.26E−12rs7703618 rs4613718 1.000 0.384 5.94E−15 rs7703618 rs4502832 0.925 0.3493.00E−10 rs7703618 rs4485937 0.925 0.349 3.00E−10 rs7703618 rs43896950.925 0.349 3.00E−10 rs7703618 rs10941692 0.925 0.349 3.00E−10 rs7703618rs12522398 0.923 0.349 5.27E−10 rs7703618 rs6869488 0.865 0.342 3.96E−10rs7703618 rs4460145 0.865 0.342 3.96E−10 rs7703618 rs10044408 0.9170.334 7.15E−09 rs7703618 rs7731099 0.856 0.327 2.53E−09 rs7703618rs7716101 0.856 0.317 3.51E−09 rs7703618 rs7709661 0.856 0.317 3.51E−09rs7703618 rs6894974 0.856 0.317 3.51E−09 rs7703618 rs6885307 0.856 0.3173.51E−09 rs7703618 rs4533894 0.856 0.317 3.51E−09 rs7703618 rs125216390.856 0.317 3.51E−09 rs7703618 rs12054976 0.856 0.317 3.51E−09 rs7703618rs7701679 0.852 0.315 8.33E−09 rs7703618 rs4371761 0.852 0.305 9.59E−09rs7703618 rs4296810 0.851 0.296 7.62E−09 rs7703618 rs1909937 0.911 0.2822.53E−08 rs7703618 rs16902068 0.911 0.282 2.53E−08 rs7703618 rs14725840.911 0.282 2.53E−08 rs7703618 rs1392970 0.911 0.282 2.53E−08 rs7703618rs12523398 0.911 0.282 2.53E−08 rs7703618 rs12521953 0.911 0.2822.53E−08 rs7703618 rs12516488 0.911 0.282 2.53E−08 rs7703618 rs125146150.911 0.282 2.53E−08 rs7703618 rs12153189 0.911 0.282 2.53E−08 rs7703618rs12153053 0.911 0.282 2.53E−08 rs7703618 rs11953498 0.911 0.2822.53E−08 rs7703618 rs10941693 0.911 0.282 2.53E−08 rs7703618 rs43570420.910 0.282 2.95E−08 rs7703618 rs12523359 0.910 0.282 2.95E−08 rs7703618rs12522305 0.910 0.282 2.95E−08 rs7703618 rs4533895 0.909 0.281 3.44E−08rs7703618 rs6898476 0.907 0.281 4.74E−08 rs7703618 rs6451796 0.786 0.2671.07E−07 rs7703618 rs3923055 0.786 0.267 1.07E−07 rs7703618 rs15013610.786 0.267 1.07E−07 rs7703618 rs1392973 0.786 0.267 1.07E−07 rs7703618rs4566805 0.903 0.266 1.31E−07 rs7703618 rs12520430 0.906 0.266 5.16E−08rs7703618 rs1405918 0.636 0.265 8.95E−08 rs7703618 rs13183434 0.9070.262 5.17E−08 rs7703618 rs7446090 0.835 0.259 1.68E−07 rs7703618rs4493682 0.833 0.259 1.97E−07 rs7703618 rs7720104 0.831 0.258 2.29E−07rs7703618 rs4308490 0.831 0.258 2.75E−07 rs7703618 rs7711444 0.901 0.2544.18E−07 rs7703618 rs6893494 0.831 0.250 2.47E−07 rs7703618 rs125231570.831 0.250 2.47E−07 rs7703618 rs11954598 0.831 0.250 2.47E−07 rs7703618rs6864149 0.827 0.249 2.79E−07 rs7703618 rs12514414 0.823 0.247 4.51E−07rs7703618 rs12520124 0.591 0.247 6.18E−07 rs7703618 rs6876773 0.8220.246 6.39E−07 rs7703618 rs13187565 0.603 0.246 1.02E−06 rs7703618rs7711446 0.821 0.246 7.62E−07 rs7703618 rs2337483 0.623 0.246 2.67E−07rs7703618 rs7709262 0.727 0.242 5.48E−07 rs7703618 rs2580260 0.598 0.2413.14E−07 rs7703618 rs6874127 0.726 0.240 8.96E−07 rs7703618 rs25891620.617 0.238 6.61E−07 rs7703618 rs6890289 0.823 0.237 7.69E−07 rs7703618rs6892627 0.593 0.231 5.31E−07 rs7703618 rs6451814 0.593 0.231 5.31E−07rs7703618 rs2337952 0.593 0.231 5.31E−07 rs7703618 rs2049656 0.593 0.2315.31E−07 rs7703618 rs1483303 0.593 0.231 5.31E−07 rs7703618 rs173430021.000 0.229 2.29E−09 rs7703618 rs6893773 0.584 0.225 9.49E−07 rs7703618rs2337951 0.584 0.225 9.49E−07 rs7703618 rs10078625 0.584 0.225 9.49E−07rs7703618 rs10036065 0.581 0.224 7.95E−07 rs7703618 rs7705696 0.5710.218 1.42E−06 rs7703618 rs2625494 0.586 0.218 1.15E−06 rs7703618rs2580258 0.586 0.218 1.15E−06 rs7703618 rs1351720 0.586 0.218 1.15E−06rs7703618 rs12110137 0.586 0.218 1.15E−06 rs7703618 rs10073636 0.5860.218 1.15E−06 rs7703618 rs10043792 0.586 0.218 1.15E−06 rs7703618rs1384732 0.573 0.215 2.06E−06 rs7703618 rs6451810 0.584 0.214 1.52E−06rs7703618 rs6860200 0.550 0.212 2.47E−06 rs7703618 rs755048 0.567 0.2112.41E−06 rs7703618 rs7732970 0.554 0.208 2.28E−06 rs7703618 rs68925940.554 0.208 2.28E−06 rs7703618 rs6451804 0.554 0.208 2.28E−06 rs7703618rs7444176 0.572 0.208 2.55E−06 rs7703618 rs9687260 0.550 0.205 2.05E−06rs7703618 rs7706116 0.550 0.205 2.05E−06 rs7703618 rs1501358 0.750 0.2042.77E−06 rs7703618 rs12655983 0.750 0.204 2.77E−06 SNP C = rs2067980rs2067980 rs13183434 0.931 0.863 1.68E−17 rs2067980 rs10044408 0.7670.444 2.19E−09 rs2067980 rs1501358 0.713 0.436 1.93E−09 rs2067980rs12655983 0.713 0.436 1.93E−09 rs2067980 rs6451796 0.779 0.434 7.64E−10rs2067980 rs3923055 0.779 0.434 7.64E−10 rs2067980 rs1501361 0.779 0.4347.64E−10 rs2067980 rs1392973 0.779 0.434 7.64E−10 rs2067980 rs68741270.778 0.430 9.74E−10 rs2067980 rs7709262 0.776 0.408 1.92E−09 rs2067980rs930395 0.774 0.383 4.51E−09 rs2067980 rs6451795 0.640 0.350 8.08E−08rs2067980 rs11948186 1.000 0.329 8.10E−11 rs2067980 rs10051592 1.0000.329 8.10E−11 rs2067980 rs6861150 0.577 0.324 3.93E−07 rs2067980rs10473387 0.743 0.298 4.83E−07 rs2067980 rs16902086 1.000 0.2944.00E−10 rs2067980 rs7711697 1.000 0.291 4.65E−10 rs2067980 rs77165711.000 0.288 1.57E−09 rs2067980 rs10941679 0.759 0.288 1.55E−07 rs2067980rs10043344 1.000 0.286 6.31E−10 rs2067980 rs7380559 1.000 0.283 6.59E−10rs2067980 rs4518409 1.000 0.283 6.59E−10 rs2067980 rs4329028 1.000 0.2831.89E−09 rs2067980 rs1438821 1.000 0.283 6.59E−10 rs2067980 rs14388201.000 0.283 6.59E−10 rs2067980 rs13362132 1.000 0.283 6.59E−10 rs2067980rs13160259 1.000 0.283 6.59E−10 rs2067980 rs11958808 1.000 0.2836.59E−10 rs2067980 rs1061310 1.000 0.283 6.59E−10 rs2067980 rs105128651.000 0.283 6.59E−10 rs2067980 rs1048758 1.000 0.283 6.59E−10 rs2067980rs10044096 1.000 0.283 6.59E−10 rs2067980 rs9292913 1.000 0.280 7.66E−10rs2067980 rs7380878 1.000 0.280 2.18E−09 rs2067980 rs13177711 1.0000.280 7.66E−10 rs2067980 rs11949847 1.000 0.278 8.92E−10 rs2067980rs7721731 0.736 0.276 1.54E−06 rs2067980 rs9790896 1.000 0.275 1.04E−09rs2067980 rs1483309 0.753 0.270 2.69E−07 rs2067980 rs1483306 0.753 0.2702.69E−07 rs2067980 rs13358718 0.753 0.270 2.69E−07 rs2067980 rs100730550.753 0.270 2.69E−07 rs2067980 rs10472404 0.750 0.268 5.28E−07 rs2067980rs12697498 0.741 0.266 2.58E−06 rs2067980 rs4463188 1.000 0.259 3.60E−09rs2067980 rs4571480 1.000 0.258 2.32E−09 rs2067980 rs2013513 1.000 0.2582.32E−09 rs2067980 rs10941677 1.000 0.258 2.32E−09 rs2067980 rs45323700.750 0.256 4.73E−07 rs2067980 rs1483312 0.750 0.256 4.73E−07 rs2067980rs1351719 0.750 0.256 4.73E−07 rs2067980 rs12656485 0.750 0.256 4.73E−07rs2067980 rs6877477 0.802 0.255 1.27E−06 rs2067980 rs6868232 0.900 0.2543.11E−07 rs2067980 rs7715731 0.893 0.254 8.48E−07 rs2067980 rs77358811.000 0.254 2.71E−09 rs2067980 rs7723539 1.000 0.254 2.71E−09 rs2067980rs7720551 1.000 0.254 2.71E−09 rs2067980 rs714130 1.000 0.254 2.71E−09rs2067980 rs6874055 1.000 0.254 7.25E−09 rs2067980 rs6861560 1.000 0.2542.71E−09 rs2067980 rs6451770 1.000 0.254 2.71E−09 rs2067980 rs44196001.000 0.254 2.71E−09 rs2067980 rs4415085 1.000 0.254 2.71E−09 rs2067980rs4321755 1.000 0.254 2.71E−09 rs2067980 rs2218081 1.000 0.254 2.71E−09rs2067980 rs2165010 1.000 0.254 7.25E−09 rs2067980 rs1438825 1.000 0.2542.71E−09 rs2067980 rs13156930 1.000 0.254 2.71E−09 rs2067980 rs125150121.000 0.254 2.71E−09 rs2067980 rs12187196 1.000 0.254 2.71E−09 rs2067980rs4452566 0.813 0.254 5.19E−07 rs2067980 rs10040082 0.906 0.252 1.42E−07rs2067980 rs3747479 0.900 0.252 3.55E−07 rs2067980 rs920329 1.000 0.2513.15E−09 rs2067980 rs1821936 1.000 0.251 3.15E−09 rs2067980 rs109416781.000 0.251 3.15E−09 rs2067980 rs10805685 1.000 0.251 3.15E−09 rs2067980rs7717459 0.905 0.249 1.51E−07 rs2067980 rs6893319 0.905 0.249 1.51E−07rs2067980 rs6875933 0.905 0.249 1.51E−07 rs2067980 rs6872254 0.905 0.2491.51E−07 rs2067980 rs6451778 0.905 0.249 1.51E−07 rs2067980 rs43732870.905 0.249 1.51E−07 rs2067980 rs1866406 0.905 0.249 1.51E−07 rs2067980rs1438822 0.905 0.249 1.51E−07 rs2067980 rs1438819 0.905 0.249 1.51E−07rs2067980 rs13189120 0.905 0.249 1.51E−07 rs2067980 rs13155698 0.9050.249 1.51E−07 rs2067980 rs13154781 0.905 0.249 1.51E−07 rs2067980rs10462080 0.905 0.249 1.51E−07 rs2067980 rs10065638 0.905 0.2491.51E−07 rs2067980 rs10059086 0.905 0.249 1.51E−07 rs2067980 rs100575210.905 0.249 1.51E−07 rs2067980 rs10053247 0.905 0.249 1.51E−07 rs2067980rs10041518 0.905 0.249 1.51E−07 rs2067980 rs10040488 0.905 0.2491.51E−07 rs2067980 rs10039866 0.905 0.249 1.51E−07 rs2067980 rs68752870.904 0.249 2.50E−07 rs2067980 rs4492118 1.000 0.249 3.66E−09 rs2067980rs2165009 1.000 0.249 3.66E−09 rs2067980 rs12522626 1.000 0.249 3.66E−09rs2067980 rs12513749 0.903 0.248 3.02E−07 rs2067980 rs7736092 0.9050.247 1.73E−07 rs2067980 rs6871052 0.905 0.247 1.73E−07 rs2067980rs10070037 0.905 0.247 1.73E−07 rs2067980 rs11741772 0.899 0.2474.65E−07 rs2067980 rs4642377 0.899 0.247 4.29E−07 rs2067980 rs64518060.734 0.245 1.74E−06 rs2067980 rs16901937 1.000 0.245 4.22E−09 rs2067980rs7708506 0.904 0.244 1.99E−07 rs2067980 rs6894324 0.899 0.244 4.89E−07rs2067980 rs4457088 0.899 0.244 4.89E−07 rs2067980 rs10038554 0.8990.244 4.89E−07 rs2067980 rs2878967 0.747 0.243 8.09E−07 rs2067980rs1564684 0.747 0.243 8.09E−07 rs2067980 rs12651949 0.864 0.242 0.000056rs2067980 rs12518851 0.896 0.240 1.34E−06 rs2067980 rs1438827 0.9040.240 2.40E−07 rs2067980 rs9292914 0.887 0.237 2.40E−06 rs2067980rs11951760 0.892 0.230 1.31E−06 rs2067980 rs7447532 1.000 0.222 0.000037rs2067980 rs5004228 1.000 0.222 0.000037 rs2067980 rs11750364 1.0000.222 0.000038 rs2067980 rs13357090 1.000 0.222 0.000037 rs2067980rs920328 0.901 0.221 5.82E−07 rs2067980 rs10462095 0.737 0.221 3.73E−06rs2067980 rs16902199 0.736 0.219 2.22E−06 rs2067980 rs7724971 0.7400.219 2.21E−06 rs2067980 rs7719703 0.740 0.219 2.21E−06 rs2067980rs7700252 0.740 0.219 2.21E−06 rs2067980 rs6898646 0.740 0.219 2.21E−06rs2067980 rs6894784 0.740 0.219 2.21E−06 rs2067980 rs6894273 0.740 0.2192.21E−06 rs2067980 rs6886950 0.740 0.219 2.21E−06 rs2067980 rs44373830.740 0.219 2.21E−06 rs2067980 rs2337954 0.740 0.219 2.21E−06 rs2067980rs16902221 0.740 0.219 2.21E−06 rs2067980 rs16902217 0.740 0.2192.21E−06 rs2067980 rs1483308 0.740 0.219 2.21E−06 rs2067980 rs133599150.740 0.219 2.21E−06 rs2067980 rs13357427 0.740 0.219 2.21E−06 rs2067980rs12109155 0.740 0.219 2.21E−06 rs2067980 rs10473389 0.740 0.2192.21E−06 rs2067980 rs10074312 0.740 0.219 2.21E−06 rs2067980 rs100668210.740 0.219 2.21E−06 rs2067980 rs1852595 0.739 0.217 2.48E−06 rs2067980rs13159362 0.739 0.217 2.48E−06 rs2067980 rs10214369 0.739 0.2172.48E−06 rs2067980 rs13361609 0.739 0.215 2.79E−06 rs2067980 rs74457300.723 0.211 6.63E−06 rs2067980 rs4339358 0.723 0.211 6.63E−06 rs2067980rs4242125 0.723 0.211 6.63E−06 rs2067980 rs4560554 0.728 0.209 0.000011rs2067980 rs4626346 0.737 0.208 3.54E−06 rs2067980 rs10052977 0.7370.208 3.54E−06 rs2067980 rs3935086 0.730 0.208 7.29E−06 rs2067980rs4132311 0.697 0.204 0.000013 rs2067980 rs4283798 0.730 0.201 8.64E−06SNP D = rs10035564 rs10035564 rs11948186 1.000 1.000 1.31E−34 rs10035564rs10051592 1.000 1.000 1.31E−34 rs10035564 rs16902086 1.000 0.8946.99E−30 rs10035564 rs9292914 0.954 0.874 8.22E−24 rs10035564 rs77116970.959 0.822 8.14E−25 rs10035564 rs3935086 1.000 0.819 2.75E−26rs10035564 rs10512875 1.000 0.819 2.75E−26 rs10035564 rs7380559 0.9590.794 1.31E−24 rs10035564 rs4518409 0.959 0.794 1.31E−24 rs10035564rs1438821 0.959 0.794 1.31E−24 rs10035564 rs1438820 0.959 0.794 1.31E−24rs10035564 rs13362132 0.959 0.794 1.31E−24 rs10035564 rs13160259 0.9590.794 1.31E−24 rs10035564 rs11958808 0.959 0.794 1.31E−24 rs10035564rs1061310 0.959 0.794 1.31E−24 rs10035564 rs10512865 0.959 0.7941.31E−24 rs10035564 rs1048758 0.959 0.794 1.31E−24 rs10035564 rs100440960.959 0.794 1.31E−24 rs10035564 rs4329028 0.958 0.793 3.59E−24rs10035564 rs7716571 0.958 0.792 4.90E−24 rs10035564 rs13177711 0.9590.792 1.92E−24 rs10035564 rs9292913 0.959 0.792 3.86E−24 rs10035564rs7380878 0.958 0.791 5.25E−24 rs10035564 rs11949847 0.959 0.7902.82E−24 rs10035564 rs9790896 0.959 0.790 4.42E−24 rs10035564 rs100433440.958 0.789 7.72E−24 rs10035564 rs11741772 0.880 0.745 1.24E−21rs10035564 rs6875287 0.882 0.745 1.24E−21 rs10035564 rs7717459 0.8800.721 1.72E−21 rs10035564 rs6893319 0.880 0.721 1.72E−21 rs10035564rs6872254 0.880 0.721 1.72E−21 rs10035564 rs6451778 0.880 0.721 1.72E−21rs10035564 rs4373287 0.880 0.721 1.72E−21 rs10035564 rs1866406 0.8800.721 1.72E−21 rs10035564 rs1438822 0.880 0.721 1.72E−21 rs10035564rs1438819 0.880 0.721 1.72E−21 rs10035564 rs13189120 0.880 0.7211.72E−21 rs10035564 rs13155698 0.880 0.721 1.72E−21 rs10035564rs13154781 0.880 0.721 1.72E−21 rs10035564 rs10462080 0.880 0.7211.72E−21 rs10035564 rs10065638 0.880 0.721 1.72E−21 rs10035564rs10059086 0.880 0.721 1.72E−21 rs10035564 rs10057521 0.880 0.7211.72E−21 rs10035564 rs10053247 0.880 0.721 1.72E−21 rs10035564rs10041518 0.880 0.721 1.72E−21 rs10035564 rs10040488 0.880 0.7211.72E−21 rs10035564 rs10039866 0.880 0.721 1.72E−21 rs10035564rs12513749 0.880 0.721 3.45E−21 rs10035564 rs6875933 0.880 0.7203.43E−21 rs10035564 rs7736092 0.880 0.719 2.46E−21 rs10035564 rs68710520.880 0.719 2.46E−21 rs10035564 rs10070037 0.880 0.719 2.46E−21rs10035564 rs7708506 0.880 0.719 3.76E−21 rs10035564 rs4642377 0.8770.718 4.69E−21 rs10035564 rs12651949 0.859 0.718 3.09E−17 rs10035564rs3747479 0.877 0.717 6.57E−21 rs10035564 rs6894324 0.877 0.716 6.70E−21rs10035564 rs4457088 0.877 0.716 6.70E−21 rs10035564 rs10038554 0.8770.716 6.70E−21 rs10035564 rs6868232 0.874 0.715 1.28E−20 rs10035564rs10040082 0.877 0.714 9.58E−21 rs10035564 rs11951760 0.875 0.7061.16E−19 rs10035564 rs7715731 0.866 0.704 3.88E−19 rs10035564 rs125188510.871 0.700 1.86E−18 rs10035564 rs1438827 0.879 0.692 1.28E−20rs10035564 rs10941677 0.915 0.669 3.88E−20 rs10035564 rs7735881 0.9140.649 9.79E−20 rs10035564 rs7723539 0.914 0.649 9.79E−20 rs10035564rs714130 0.914 0.649 9.79E−20 rs10035564 rs6861560 0.914 0.649 9.79E−20rs10035564 rs6451770 0.914 0.649 9.79E−20 rs10035564 rs4419600 0.9140.649 9.79E−20 rs10035564 rs4415085 0.914 0.649 9.79E−20 rs10035564rs4321755 0.914 0.649 9.79E−20 rs10035564 rs2218081 0.914 0.649 9.79E−20rs10035564 rs1438825 0.914 0.649 9.79E−20 rs10035564 rs13156930 0.9140.649 9.79E−20 rs10035564 rs12515012 0.914 0.649 9.79E−20 rs10035564rs12187196 0.914 0.649 9.79E−20 rs10035564 rs7720551 0.914 0.6481.95E−19 rs10035564 rs6874055 0.912 0.647 2.45E−19 rs10035564 rs21650100.912 0.647 2.45E−19 rs10035564 rs920329 0.913 0.647 1.91E−19 rs10035564rs10941678 0.914 0.646 1.42E−19 rs10035564 rs10805685 0.914 0.6461.42E−19 rs10035564 rs1821936 0.914 0.646 2.84E−19 rs10035564 rs44631880.911 0.645 4.78E−19 rs10035564 rs4571480 0.912 0.644 3.53E−19rs10035564 rs2013513 0.912 0.644 3.53E−19 rs10035564 rs4492118 0.9140.643 2.06E−19 rs10035564 rs2165009 0.914 0.643 2.06E−19 rs10035564rs12522626 0.914 0.643 2.06E−19 rs10035564 rs16901937 0.913 0.6254.75E−19 rs10035564 rs920328 0.834 0.580 4.14E−17 rs10035564 rs68694881.000 0.489 1.22E−15 rs10035564 rs4460145 1.000 0.489 1.22E−15rs10035564 rs7731099 1.000 0.478 4.71E−15 rs10035564 rs7716101 1.0000.463 7.32E−15 rs10035564 rs7709661 1.000 0.463 7.32E−15 rs10035564rs7701679 1.000 0.463 8.99E−15 rs10035564 rs6894974 1.000 0.463 7.32E−15rs10035564 rs6885307 1.000 0.463 7.32E−15 rs10035564 rs4533894 1.0000.463 7.32E−15 rs10035564 rs12521639 1.000 0.463 7.32E−15 rs10035564rs12054976 1.000 0.463 7.32E−15 rs10035564 rs4371761 1.000 0.4522.30E−14 rs10035564 rs4502832 1.000 0.438 4.20E−14 rs10035564 rs44859371.000 0.438 4.20E−14 rs10035564 rs4389695 1.000 0.438 4.20E−14rs10035564 rs4296810 1.000 0.438 4.20E−14 rs10035564 rs12522398 1.0000.438 6.21E−14 rs10035564 rs10941692 1.000 0.438 4.20E−14 rs10035564rs10044408 1.000 0.425 2.83E−13 rs10035564 rs7720104 1.000 0.4001.04E−12 rs10035564 rs4493682 1.000 0.400 8.66E−13 rs10035564 rs43084901.000 0.400 1.04E−12 rs10035564 rs6876773 1.000 0.390 1.45E−12rs10035564 rs7711446 1.000 0.388 2.05E−12 rs10035564 rs6893494 1.0000.388 1.21E−12 rs10035564 rs12523157 1.000 0.388 1.21E−12 rs10035564rs12514414 1.000 0.388 2.44E−12 rs10035564 rs11954598 1.000 0.3881.21E−12 rs10035564 rs6864149 1.000 0.388 1.44E−12 rs10035564 rs68984761.000 0.375 5.36E−12 rs10035564 rs6890289 1.000 0.375 4.52E−12rs10035564 rs4533895 1.000 0.375 4.52E−12 rs10035564 rs7709262 0.8690.370 1.13E−10 rs10035564 rs1405918 0.768 0.368 6.67E−11 rs10035564rs4357042 1.000 0.364 7.23E−12 rs10035564 rs1909937 1.000 0.364 6.12E−12rs10035564 rs1472584 1.000 0.364 6.12E−12 rs10035564 rs1392970 1.0000.364 6.12E−12 rs10035564 rs12523398 1.000 0.364 6.12E−12 rs10035564rs12523359 1.000 0.364 7.23E−12 rs10035564 rs12522305 1.000 0.3647.23E−12 rs10035564 rs12521953 1.000 0.364 6.12E−12 rs10035564rs12516488 1.000 0.364 6.12E−12 rs10035564 rs12153189 1.000 0.3646.12E−12 rs10035564 rs12153053 1.000 0.364 6.12E−12 rs10035564rs11953498 1.000 0.364 6.12E−12 rs10035564 rs10941693 1.000 0.3646.12E−12 rs10035564 rs16902068 1.000 0.364 6.12E−12 rs10035564rs12514615 1.000 0.364 6.12E−12 rs10035564 rs6874127 0.865 0.3573.92E−10 rs10035564 rs13187565 0.747 0.357 1.02E−09 rs10035564 rs46137181.000 0.356 6.97E−14 rs10035564 rs2625494 0.766 0.354 9.14E−11rs10035564 rs2580258 0.766 0.354 9.14E−11 rs10035564 rs1351720 0.7660.354 9.14E−11 rs10035564 rs12110137 0.766 0.354 9.14E−11 rs10035564rs10073636 0.766 0.354 9.14E−11 rs10035564 rs10043792 0.766 0.3549.14E−11 rs10035564 rs12520124 0.722 0.351 7.29E−10 rs10035564 rs45668051.000 0.350 3.68E−11 rs10035564 rs12520430 1.000 0.349 2.49E−11rs10035564 rs6451796 0.863 0.345 5.58E−10 rs10035564 rs3923055 0.8630.345 5.58E−10 rs10035564 rs1501361 0.863 0.345 5.58E−10 rs10035564rs1392973 0.863 0.345 5.58E−10 rs10035564 rs2589162 0.756 0.340 6.65E−10rs10035564 rs13183434 1.000 0.340 2.99E−11 rs10035564 rs7446090 0.9220.339 6.66E−10 rs10035564 rs2580260 0.727 0.338 3.86E−10 rs10035564rs7711444 1.000 0.337 6.16E−11 rs10035564 rs7446182 0.724 0.326 7.25E−10rs10035564 rs6892627 0.724 0.326 7.25E−10 rs10035564 rs6451814 0.7240.326 7.25E−10 rs10035564 rs2337952 0.724 0.326 7.25E−10 rs10035564rs2049656 0.724 0.326 7.25E−10 rs10035564 rs1483303 0.724 0.326 7.25E−10rs10035564 rs12654213 0.721 0.324 6.86E−10 rs10035564 rs7717787 0.7150.323 1.43E−09 rs10035564 rs7705696 0.715 0.323 1.43E−09 rs10035564rs6893773 0.717 0.322 1.41E−09 rs10035564 rs2337951 0.717 0.322 1.41E−09rs10035564 rs10078625 0.717 0.322 1.41E−09 rs10035564 rs2337483 0.7210.322 1.78E−09 rs10035564 rs10036065 0.713 0.320 1.23E−09 rs10035564rs755048 0.712 0.315 2.62E−09 rs10035564 rs1384732 0.710 0.314 3.47E−09rs10035564 rs6451810 0.718 0.308 2.46E−09 rs10035564 rs6860200 0.6780.307 5.17E−09 rs10035564 rs7732970 0.682 0.300 4.88E−09 rs10035564rs6892594 0.682 0.300 4.88E−09 rs10035564 rs6451804 0.682 0.300 4.88E−09rs10035564 rs7711528 0.838 0.299 3.91E−08 rs10035564 rs9687260 0.6790.298 4.57E−09 rs10035564 rs7706116 0.679 0.298 4.57E−09 rs10035564rs6451802 0.617 0.293 5.15E−08 rs10035564 rs4455566 0.672 0.293 8.79E−09rs10035564 rs6451793 0.784 0.285 8.16E−08 rs10035564 rs10039283 0.6740.283 1.09E−08 rs10035564 rs6895191 0.672 0.279 1.47E−08 rs10035564rs6878425 0.672 0.279 1.47E−08 rs10035564 rs10462097 0.672 0.2791.47E−08 rs10035564 rs7444176 0.672 0.279 2.05E−08 rs10035564 rs169020840.780 0.273 1.87E−07 rs10035564 rs1501358 0.838 0.272 4.94E−08rs10035564 rs12655983 0.838 0.272 4.94E−08 rs10035564 rs16902083 0.7780.265 1.49E−07 rs10035564 rs10941679 0.634 0.263 1.41E−07 rs10035564rs6882139 0.632 0.258 6.10E−08 rs10035564 rs6451843 0.624 0.252 1.14E−07rs10035564 rs6862655 0.815 0.250 3.35E−08 rs10035564 rs10059745 0.8150.250 3.35E−08 rs10035564 rs7718785 0.521 0.246 2.76E−07 rs10035564rs7444405 0.626 0.244 1.36E−07 rs10035564 rs4639238 0.812 0.243 4.87E−08rs10035564 rs12374507 0.812 0.243 4.87E−08 rs10035564 rs10066953 0.8120.243 4.87E−08 rs10035564 rs930395 0.686 0.242 4.25E−07 rs10035564rs6866354 0.807 0.241 8.01E−08 rs10035564 rs10054521 0.811 0.2396.85E−08 rs10035564 rs12520938 0.589 0.235 3.01E−07 rs10035564 rs77091310.592 0.234 3.01E−07 rs10035564 rs7445572 0.592 0.234 3.01E−07rs10035564 rs6861150 0.818 0.227 8.14E−07 rs10035564 rs6451795 0.7570.222 1.20E−06 rs10035564 rs13156198 0.585 0.220 6.49E−07 rs10035564rs4569881 0.553 0.211 1.33E−06 rs10035564 rs13361919 0.553 0.2111.33E−06 rs10035564 rs13185201 0.553 0.211 1.33E−06 rs10035564rs10941740 0.561 0.210 4.73E−06 rs10035564 rs12697517 0.603 0.2092.07E−06 rs10035564 rs12697503 0.600 0.209 2.90E−06 rs10035564 rs74439760.578 0.208 1.36E−06 SNP E = rs11743392 rs11743392 rs13179818 0.9270.831 4.07E−26 rs11743392 rs2625494 1.000 0.527 1.80E−19 rs11743392rs2580258 1.000 0.527 1.80E−19 rs11743392 rs1351720 1.000 0.527 1.80E−19rs11743392 rs6451810 1.000 0.527 3.30E−19 rs11743392 rs12110137 1.0000.527 1.80E−19 rs11743392 rs10073636 1.000 0.527 1.80E−19 rs11743392rs10043792 1.000 0.527 1.80E−19 rs11743392 rs1384732 1.000 0.5174.95E−19 rs11743392 rs755048 1.000 0.513 8.25E−19 rs11743392 rs74461821.000 0.509 6.24E−19 rs11743392 rs2337483 1.000 0.509 1.13E−18rs11743392 rs2049656 1.000 0.509 6.24E−19 rs11743392 rs1483303 1.0000.509 6.24E−19 rs11743392 rs6892627 1.000 0.509 6.24E−19 rs11743392rs6451814 1.000 0.509 6.24E−19 rs11743392 rs2589162 1.000 0.509 2.03E−18rs11743392 rs2337952 1.000 0.509 6.24E−19 rs11743392 rs1405918 1.0000.509 6.24E−19 rs11743392 rs12654213 1.000 0.509 6.24E−19 rs11743392rs7444176 1.000 0.505 1.08E−17 rs11743392 rs6893773 1.000 0.505 1.03E−18rs11743392 rs2580260 1.000 0.505 1.03E−18 rs11743392 rs2337951 1.0000.505 1.03E−18 rs11743392 rs10078625 1.000 0.505 1.03E−18 rs11743392rs10036065 1.000 0.505 1.03E−18 rs11743392 rs7717787 1.000 0.5001.72E−18 rs11743392 rs7705696 1.000 0.500 1.72E−18 rs11743392 rs77329701.000 0.492 2.09E−18 rs11743392 rs6892594 1.000 0.492 2.09E−18rs11743392 rs6451804 1.000 0.492 2.09E−18 rs11743392 rs9687260 1.0000.492 2.09E−18 rs11743392 rs7706116 1.000 0.492 2.09E−18 rs11743392rs4455566 1.000 0.487 3.45E−18 rs11743392 rs6860200 1.000 0.483 5.74E−18rs11743392 rs13187565 1.000 0.479 1.77E−16 rs11743392 rs12520124 1.0000.475 5.19E−17 rs11743392 rs7444405 0.951 0.461 1.77E−15 rs11743392rs10039283 0.951 0.461 1.77E−15 rs11743392 rs7443976 0.951 0.4603.54E−15 rs11743392 rs6895191 0.950 0.460 3.12E−15 rs11743392 rs68784250.950 0.460 3.12E−15 rs11743392 rs10462097 0.950 0.460 3.12E−15rs11743392 rs6882139 0.950 0.444 5.53E−15 rs11743392 rs13156198 0.9490.444 1.11E−14 rs11743392 rs10041478 0.948 0.438 3.12E−14 rs11743392rs6451843 0.948 0.438 3.57E−14 rs11743392 rs10042199 0.947 0.4338.75E−14 rs11743392 rs12520938 0.948 0.429 4.12E−14 rs11743392 rs77091310.948 0.428 3.36E−14 rs11743392 rs7445572 0.948 0.428 3.36E−14rs11743392 rs6884716 0.948 0.428 3.36E−14 rs11743392 rs4302598 0.9480.428 3.36E−14 rs11743392 rs4277924 0.948 0.428 3.36E−14 rs11743392rs13361118 0.948 0.428 3.36E−14 rs11743392 rs13155231 0.948 0.4283.36E−14 rs11743392 rs12654375 0.948 0.428 3.36E−14 rs11743392rs12652235 0.948 0.428 3.36E−14 rs11743392 rs12523291 0.948 0.4283.36E−14 rs11743392 rs12188166 0.948 0.428 3.36E−14 rs11743392rs10941727 0.948 0.428 3.36E−14 rs11743392 rs6451802 1.000 0.4289.62E−16 rs11743392 rs10805706 0.948 0.428 6.73E−14 rs11743392 rs45698810.947 0.413 9.91E−14 rs11743392 rs13361919 0.947 0.413 9.91E−14rs11743392 rs13185201 0.947 0.413 9.91E−14 rs11743392 rs10941740 0.9420.410 1.20E−12 rs11743392 rs12697503 0.856 0.402 3.42E−12 rs11743392rs13186830 0.944 0.395 1.42E−12 rs11743392 rs12697517 0.848 0.3901.27E−11 rs11743392 rs7713759 0.852 0.383 7.89E−12 rs11743392 rs131647220.852 0.383 7.89E−12 rs11743392 rs12153540 0.852 0.383 7.89E−12rs11743392 rs11958686 0.849 0.381 1.35E−11 rs11743392 rs12697523 0.8510.378 1.23E−11 rs11743392 rs12690678 0.851 0.378 1.23E−11 rs11743392rs12656953 0.851 0.378 1.23E−11 rs11743392 rs7718785 1.000 0.3623.18E−14 rs11743392 rs7719500 1.000 0.276 1.59E−11 rs11743392 rs25891811.000 0.276 1.59E−11 rs11743392 rs4367308 1.000 0.276 1.59E−11rs11743392 rs4282323 1.000 0.276 1.59E−11 rs11743392 rs10041772 1.0000.276 1.59E−11 rs11743392 rs10041767 1.000 0.276 1.59E−11 rs11743392rs4283798 1.000 0.274 2.54E−11 rs11743392 rs13188585 1.000 0.2706.52E−11 rs11743392 rs4626346 1.000 0.265 3.64E−11 rs11743392 rs133616091.000 0.265 4.42E−11 rs11743392 rs4975889 1.000 0.260 6.07E−11rs11743392 rs1852595 1.000 0.260 6.07E−11 rs11743392 rs10214369 1.0000.260 6.07E−11 rs11743392 rs7700252 1.000 0.255 8.24E−11 rs11743392rs1483308 1.000 0.255 8.24E−11 rs11743392 rs9686580 1.000 0.255 8.24E−11rs11743392 rs7724971 1.000 0.255 8.24E−11 rs11743392 rs7719703 1.0000.255 8.24E−11 rs11743392 rs7714713 1.000 0.255 8.24E−11 rs11743392rs7703405 1.000 0.255 8.24E−11 rs11743392 rs7447232 1.000 0.255 8.24E−11rs11743392 rs6898646 1.000 0.255 8.24E−11 rs11743392 rs6894784 1.0000.255 8.24E−11 rs11743392 rs6894273 1.000 0.255 8.24E−11 rs11743392rs6886950 1.000 0.255 8.24E−11 rs11743392 rs4560554 1.000 0.255 1.63E−10rs11743392 rs4452566 1.000 0.255 1.16E−10 rs11743392 rs4437383 1.0000.255 8.24E−11 rs11743392 rs4407637 1.000 0.255 8.24E−11 rs11743392rs2337954 1.000 0.255 8.24E−11 rs11743392 rs16902221 1.000 0.2558.24E−11 rs11743392 rs16902217 1.000 0.255 8.24E−11 rs11743392rs13359915 1.000 0.255 8.24E−11 rs11743392 rs13357427 1.000 0.2558.24E−11 rs11743392 rs13159362 1.000 0.255 1.16E−10 rs11743392rs12109155 1.000 0.255 8.24E−11 rs11743392 rs10074312 1.000 0.2558.24E−11 rs11743392 rs10066821 1.000 0.255 8.24E−11 rs11743392rs10473389 1.000 0.255 8.24E−11 rs11743392 rs6414908 1.000 0.2532.47E−10 rs11743392 rs10462095 1.000 0.252 1.87E−10 rs11743392 rs74457301.000 0.249 1.37E−10 rs11743392 rs4339358 1.000 0.249 1.37E−10rs11743392 rs4242125 1.000 0.249 1.37E−10 rs11743392 rs16902199 1.0000.249 1.37E−10 rs11743392 rs12655230 1.000 0.249 1.37E−10 rs11743392rs6877477 1.000 0.243 3.19E−10 rs11743392 rs7706959 1.000 0.239 3.06E−10rs11743392 rs7446602 1.000 0.239 3.06E−10 rs11743392 rs13356124 1.0000.239 3.06E−10 rs11743392 rs4132311 1.000 0.237 3.86E−10 rs11743392rs2878967 1.000 0.234 4.04E−10 rs11743392 rs16902186 1.000 0.2344.04E−10 rs11743392 rs1564684 1.000 0.234 4.04E−10 rs11743392 rs14059161.000 0.234 4.04E−10 rs11743392 rs12653475 1.000 0.234 4.04E−10rs11743392 rs12697498 1.000 0.232 1.15E−09 rs11743392 rs7722380 1.0000.228 6.74E−10 rs11743392 rs4532370 1.000 0.224 8.78E−10 rs11743392rs1483312 1.000 0.224 8.78E−10 rs11743392 rs1351719 1.000 0.224 8.78E−10rs11743392 rs12656485 1.000 0.224 8.78E−10 rs11743392 rs10052977 0.9190.224 7.24E−08 rs11743392 rs6451806 1.000 0.219 1.98E−09 rs11743392rs1483309 1.000 0.215 1.89E−09 rs11743392 rs1483306 1.000 0.215 1.89E−09rs11743392 rs13358718 1.000 0.215 1.89E−09 rs11743392 rs10472404 1.0000.215 1.89E−09 rs11743392 rs10073055 1.000 0.215 1.89E−09 rs11743392rs12518113 0.846 0.209 6.42E−07 rs11743392 rs4288123 1.000 0.2093.15E−09 rs11743392 rs17268006 0.767 0.205 1.34E−06 rs11743392 rs49759240.844 0.204 8.60E−07 rs11743392 rs4128583 0.844 0.204 8.60E−07rs11743392 rs12697524 0.844 0.204 8.60E−07 rs11743392 rs12523279 0.8440.204 8.60E−07 rs11743392 rs12522090 0.844 0.204 8.60E−07 rs11743392rs12019302 0.844 0.204 8.60E−07 rs11743392 rs11949184 0.844 0.2048.60E−07 rs11743392 rs10941798 0.844 0.204 8.60E−07 rs11743392rs10462111 0.844 0.204 8.60E−07 rs11743392 rs10941748 0.844 0.2041.01E−06 rs11743392 rs4975948 0.836 0.202 1.67E−06 rs11743392 rs77217311.000 0.201 9.28E−09 SNP F = rs7716600 rs7716600 rs930395 1.000 1.0003.05E−27 rs7716600 rs10941679 1.000 0.777 1.77E−21 rs7716600 rs126519491.000 0.505 4.24E−14 rs7716600 rs7715731 1.000 0.482 2.21E−14 rs7716600rs10040082 1.000 0.480 2.37E−15 rs7716600 rs6868232 1.000 0.473 8.95E−15rs7716600 rs7717459 1.000 0.471 2.94E−15 rs7716600 rs4642377 1.000 0.4718.77E−15 rs7716600 rs10041518 1.000 0.471 2.94E−15 rs7716600 rs100404881.000 0.471 2.94E−15 rs7716600 rs10039866 1.000 0.471 2.94E−15 rs7716600rs6893319 1.000 0.471 2.94E−15 rs7716600 rs6875933 1.000 0.471 2.94E−15rs7716600 rs6872254 1.000 0.471 2.94E−15 rs7716600 rs6451778 1.000 0.4712.94E−15 rs7716600 rs4373287 1.000 0.471 2.94E−15 rs7716600 rs18664061.000 0.471 2.94E−15 rs7716600 rs1438822 1.000 0.471 2.94E−15 rs7716600rs1438819 1.000 0.471 2.94E−15 rs7716600 rs13189120 1.000 0.471 2.94E−15rs7716600 rs13155698 1.000 0.471 2.94E−15 rs7716600 rs13154781 1.0000.471 2.94E−15 rs7716600 rs12513749 1.000 0.471 2.94E−15 rs7716600rs10462080 1.000 0.471 2.94E−15 rs7716600 rs10065638 1.000 0.4712.94E−15 rs7716600 rs10059086 1.000 0.471 2.94E−15 rs7716600 rs100575211.000 0.471 2.94E−15 rs7716600 rs10053247 1.000 0.471 2.94E−15 rs7716600rs7736092 1.000 0.468 3.68E−15 rs7716600 rs4457088 1.000 0.468 1.09E−14rs7716600 rs10038554 1.000 0.468 1.09E−14 rs7716600 rs6894324 1.0000.468 1.09E−14 rs7716600 rs6871052 1.000 0.468 3.68E−15 rs7716600rs3747479 1.000 0.468 1.09E−14 rs7716600 rs10070037 1.000 0.468 3.68E−15rs7716600 rs7708506 1.000 0.465 4.60E−15 rs7716600 rs6875287 1.000 0.4611.69E−14 rs7716600 rs11741772 1.000 0.461 4.98E−14 rs7716600 rs125188511.000 0.459 2.75E−14 rs7716600 rs1438827 1.000 0.454 6.91E−15 rs7716600rs11951760 1.000 0.451 3.30E−14 rs7716600 rs10043344 1.000 0.4441.35E−14 rs7716600 rs11958808 1.000 0.437 1.57E−14 rs7716600 rs100440961.000 0.437 1.57E−14 rs7716600 rs7711697 1.000 0.437 4.46E−14 rs7716600rs7380559 1.000 0.437 1.57E−14 rs7716600 rs4518409 1.000 0.437 1.57E−14rs7716600 rs4329028 1.000 0.437 4.46E−14 rs7716600 rs1438821 1.000 0.4371.57E−14 rs7716600 rs1438820 1.000 0.437 1.57E−14 rs7716600 rs133621321.000 0.437 1.57E−14 rs7716600 rs13160259 1.000 0.437 1.57E−14 rs7716600rs1061310 1.000 0.437 1.57E−14 rs7716600 rs10512865 1.000 0.437 1.57E−14rs7716600 rs1048758 1.000 0.437 1.57E−14 rs7716600 rs9292913 1.000 0.4341.96E−14 rs7716600 rs7380878 1.000 0.434 5.55E−14 rs7716600 rs77165711.000 0.434 5.55E−14 rs7716600 rs13177711 1.000 0.434 1.96E−14 rs7716600rs11949847 1.000 0.431 2.45E−14 rs7716600 rs920328 1.000 0.421 3.47E−14rs7716600 rs9292914 1.000 0.404 7.68E−11 rs7716600 rs4571480 1.000 0.4001.17E−13 rs7716600 rs2013513 1.000 0.400 1.17E−13 rs7716600 rs77358811.000 0.392 1.56E−13 rs7716600 rs7723539 1.000 0.392 1.56E−13 rs7716600rs7720551 1.000 0.392 1.56E−13 rs7716600 rs714130 1.000 0.392 1.56E−13rs7716600 rs6874055 1.000 0.392 4.15E−13 rs7716600 rs6861560 1.000 0.3921.56E−13 rs7716600 rs6451770 1.000 0.392 1.56E−13 rs7716600 rs44196001.000 0.392 1.56E−13 rs7716600 rs4415085 1.000 0.392 1.56E−13 rs7716600rs4321755 1.000 0.392 1.56E−13 rs7716600 rs2218081 1.000 0.392 1.56E−13rs7716600 rs2165010 1.000 0.392 4.15E−13 rs7716600 rs1438825 1.000 0.3921.56E−13 rs7716600 rs13156930 1.000 0.392 1.56E−13 rs7716600 rs125150121.000 0.392 1.56E−13 rs7716600 rs12187196 1.000 0.392 1.56E−13 rs7716600rs4463188 1.000 0.390 7.67E−13 rs7716600 rs920329 1.000 0.389 1.95E−13rs7716600 rs1821936 1.000 0.389 1.95E−13 rs7716600 rs10941678 1.0000.389 1.95E−13 rs7716600 rs10941677 1.000 0.389 5.16E−13 rs7716600rs10805685 1.000 0.389 1.95E−13 rs7716600 rs4492118 1.000 0.385 2.44E−13rs7716600 rs2165009 1.000 0.385 2.44E−13 rs7716600 rs12522626 1.0000.385 2.44E−13 rs7716600 rs16901937 1.000 0.378 3.20E−13 rs7716600rs9790896 0.931 0.370 6.39E−11 rs7716600 rs1482698 0.926 0.328 2.69E−10rs7716600 rs13183434 0.685 0.314 1.52E−07 rs7716600 rs1482685 0.8480.256 3.88E−08 rs7716600 rs1384451 0.848 0.256 3.88E−08 rs7716600rs2200123 0.757 0.252 8.51E−07 rs7716600 rs11749656 1.000 0.248 4.20E−06rs7716600 rs1482667 0.779 0.234 2.90E−07 rs7716600 rs11948186 0.6790.233 9.05E−07 rs7716600 rs10051592 0.679 0.233 9.05E−07 rs7716600rs10473355 0.770 0.231 5.67E−07 rs7716600 rs10472394 0.778 0.2313.47E−07 rs7716600 rs2877162 0.777 0.229 3.52E−07 rs7716600 rs23305510.777 0.229 3.52E−07 rs7716600 rs10055789 0.777 0.229 3.52E−07 rs7716600rs10055953 0.776 0.226 4.20E−07 rs7716600 rs987852 0.775 0.221 5.02E−07rs7716600 rs4242112 0.775 0.221 5.02E−07 rs7716600 rs2330553 0.775 0.2215.02E−07 rs7716600 rs12054807 0.775 0.221 5.02E−07 rs7716600 rs28771630.774 0.219 5.99E−07 rs7716600 rs10941665 0.774 0.219 5.99E−07 rs7716600rs7356597 0.763 0.214 1.35E−06 rs7716600 rs10473354 0.760 0.207 1.87E−06SNP 1 is the Illumina SNP with the lowest P-value in each of equivalenceclasses A-F. SNP 2 is the HapMap SNP that is correlated to SNP 1. D′ isthe mean D′ value across all combinations of the alleles of SNP 1 andSNP 2. R2 is the square of the correlation coefficient between the twoSNPs. p-min is the P-value corresponding to the strongest linkagedisequilibrium observed between alleles of SNP 1 and SNP 2.

TABLE 4 Sequence Contexts of Key SNPs.Key SNP A: rs4415084 deCODE Name: SG05S3092 (SEQ ID NO:235)caggttatgctacttccctggaggacctctcaaaaggaagctgtttgttctatttctttctcatctgtcccaggactaggtattgcattaggagatccettgcttcccactgctgcttttaaatcatttcatttccttcttcccttcattcttcccaaatgcaaggtctttcaactttcatttcgtgctacactctgccctttattgctgctctctggaatttgtggtcactgtccctcatacactgaaaactcacatacctctacctctagccctgttgtattcctgatgacttgagca[C/T]ccaagggagtgatacatacagcactggtcaatcatttctttacctgccacacatacagcaatctttaatttcaatagccttagccactcattcccaaataatgcttggatcatgcacattatcatgagtaaatacacccatgtctgaaatcctgatttcaagtacttcccaatttttctgtcttttctttactttcagctcacagaaacaattcttccaccatattaaaaactctaatccaattcacttgttccaccactttttttattcattattctctcctgtctttactttcttcctKey SNP B: rs7703618 deCODE Name: SG05S3065 (SEQ ID NO:140)gtgaggacacagagccaaaccatttcaccagagggctgagtaactctaatctggcaggatgattatcctacacaggttgcaatggcccctgaaatttggacgcactttgtgagagaccagtgtctagataactaggaactaggtaaatgttggagagctgcttcccttcatttctgtcattgtctgtttcatttcctttgcattgtttgttgatctgtattaaacaaaaatgaaagcaaaccttgtatctgagtctccatttttaccaatcctcacatttatggttcagtgtcttagtct[A/G]gtttcgaataacaagaaccttttgtacttggaagtataaaacttgatagcagcaacattattgatatttagagctcagtacctgtctaattacaggcaggcagaaagaagtgtcaaggtattcttgcttatcaggtcacaggtaatttcttcctctaagaattcataaactgatagactaatattggagaaagaaatgcaatttaattgctgaaagtctgtttcagtttactggtcttgtaatagaggtaaaattctaaacaacttggggagctttggtgagaattaaaataggtgggtgKey SNP C: rs2067980 deCODE Name: SG05S3114 (SEQ ID NO:160)gaatatgacgtcatataggcattaatttccatgttatgaattcaccagtaaaattgtttaaacagagaagtaaacaagacggtaatgttattcaggtaaaagtagagagggaaaagaaatattggaaccagttcagcaaccaaaatggtgccagagcccaagcatgagttattaaaggctggtggttcctctctcctgacccattaccattcttatctctgatgctccaggctgtcagtttctttcttttttgaccatatacaggtaaggaaagcccatttatgagctattttatttcca[A/G]gttttaaaaatgtcaattgatataggctatgatctacagtaatgcttaatctattgaagtttttgcatcaaattccatcttaagatgcaagcctgaagcccatttaatgccaaatgtaaatacaagtgctagtttcaaagggcaagattcaaagaaagacaaacagaagaaaagtattttaattgctatctaaaagaaggctgtgttcttgggtgaatactttgttgatgtatttggggtagaaacagagggagaaataattatgtaatgttaagctgttttctaaaattccagggctccKey SNP D: rs10035564 deCODE Name: SG05S3104acaataagtttttagtgatattagatttttttcatttttggaagaagaacagaaaaagtgtaaaaagatggaataatatagaaaatggtagctggaggattcaaagaagaactcacttttatcatgtcaaagctaaaatataaattgtagattttgcatatgtacaatgagcagaaacacatagctgaagaaagaagtgtgctaaataaataaatgaagtattaatgattgagcagagtcttagaaagttggacatgttaagagcattgaatctatttagcctttcatgccatgcccaaa[A/G]tcagaattttaacctatactaggactttaagacaaaaaataggcaaacaaaatcacaaagtgttacaattgacatatgcagtgaattgtttcccttaaaaacaacattttttttttagttatatcactactataaaatttattcttcaacaggcaactaaacgtaatctggtttaatctttttttataaaggaacattttaaagtaattcttttctcctaacagaccatctattttcctctaaatctctttagctttaatatctattttagcgataaacagtgcatgaaataaacagctcKey SNP E: rs11743392 deCODE Name: SG05S3093aatctttcaaatatattcatctctcacttatttagggactgattatccaatttgtgaactatccctgtggcttctcctcttttctctaatgatttctcctctgcctatttccttaaatcgttttaatactaaatgagctgcatgaaaacagaaaagaagctaaagcagcaaaatttgatacatataaacagtactgcaaaagaatttcatttgtgctcatatgtttttgaattttcaattttctgttaccccacttccatatttcacactccagattatgtcaccccacccaactcccaa[C/T]aatttgaaattcaaatttggaaattcatctattggttcatttagttggaaactgcatattcacaggtggagagtggaatatatttcaaaaccacagagaaaaaaaaaaaaacgtaattcaacttcgttaatttgtttttaattttccaaagctggaaattgtctctatatctcaattgatgagtttctgagctaaaaacaaaacaaaacaaaacaaaacatcatttcctgtaaccagatttcactgctttcattctaagcaagatgatataaataacaatgagtagtcaagtatttattcKey SNP F: rs7716600 (SEQ ID NO:125) SG05S3097aggcctaatggttgtatatatatatttttttatttggtagcagaaaagactttaaaatatgttgatgtttgcgaggtaaagcatctatgtagggcattactatcaaggctttttttttctgcttgagtctatattacaaacattttattatgtctctgctgagattaatttaaatgtgcaaattttcaattcctaatataaagataaaatgtaaagttgatccaaaaatacaaaaaaagtgataaaacttagtttgtaatatagactcatatatcatatttttagttctatttcaatgct[A/G]tctagaatttttatcattgctttttacctgaagattcaaattgttttggcatcagtcgggaaatcagtttgtttagctagcaaaaatagacattaataaataaacccagaatacttagaagagatagatagggacccagatctctcaagaaatacggctacagctaattgctatttctacacaaattaacaagcaagctataaactggcatgtgggattttttttttttttttttctctgagacaaggtttcactctctctcccagacgggagtgcagtggtaccatcttggttcagggc

TABLE 5 SNP and Centaurus Assay Description:SNP name: SG05S3065 or rs7703618 Mapping information (Build 34)chr5:44.960080+ Assay SG05S3065.c1 of type CENTAURUS, status VerifiedForward Primer: GCAAACCTTGTATCTGAGTCTCCATReverse Primer: GTGACCTGATAAGCAAGAATACCT Vic probe: CTTA*GTCTGGTEarn probe: TCT*T*A*GTCTAGT Enhancer: TCGAATAACAAGAACC *indicates amodified base as described in [Kutyavin, et al., (2006), Nucleic AcidsRes, 34, e128].

Example 2 Refinement of Association Signal on Chromosome 5p12,Correlation with Clinical Variables and Investigation of FGFR2 Locus onChromosome 10q26

The signals we have identified on chromosome 5p12 localize to a largestretch of chromosome 5p12-11 exhibiting a low recombination. From thisregion we selected 10 SNPs from the Illumina Hap300 chip set, generatedCentaurus SNP assays[Kutyavin, et al., (2006), Nucleic Acids Res, 34,e128] and typed them in additional samples from Iceland, and inreplication samples from Sweden, Holland, Spain and the United States.In total, 5028 cases and 32090 controls of European ancestry werestudied. The most strongly associated Illumina SNP in the region wasrs4415084, the T allele giving a combined odds ratio (OR) of 1.16 and aP value of 6.4×10⁻¹⁰, which meets the Bonferroni criteria forgenome-wide significance (Table 6). In the replication samples alone,rs4415084 gave an OR of 1.14 (P=7.5×10⁻⁵). To refine the signal, wetyped a further 11 SNPs that were not on the Illumina chip, but were inLD with Hap300 SNPs giving a substantial signal. Data from these SNPs ispresented in Tables 6 and 7. The strongest overall signal (OR 1.19,P=2.9×10⁻¹¹) originated from the G allele of rs10941679, a non-IlluminaSNP that is correlated to rs4415084 (D′=0.99, r²=0.51 in the Icelandicpopulation, Table 8).

Allele G-rs10941679 is less common than T-rs4415084 and is almostcompletely contained on the T-rs4415084 background. However, in amultivariate analysis, T-rs4415084 retained nominal significance aftercorrection for G-rs10941679 (P=0.042), and vice versa (P=0.0017, Table9). Therefore, despite being highly correlated, neither SNP accountscompletely for the observed signal at 5p12. We concluded that bothT-rs4415084 and G-rs10941679 confer risk of breast cancer. Multivariateanalysis revealed that the signal from SNP rs7703618 could be accountedfor entirely by either T-rs4415084 or G-rs10941679 (Table 9).

We reviewed the medical records of the patients, if they were available,and analyzed the combined data from the Icelandic and replication samplesets for the two risk variants at 5p12 and marker rs1219648 at the FGFR2locus on chromosome 10. All three variants conferred significantlygreater risk of estrogen receptor (ER) positive breast cancer than of ERnegative tumours (Tables 6 and 10). A similar preferential risk was seenfor progesterone receptor positive tumours for the 5p12 variants (Table10). We previously reported that susceptibility variants on 2q35 and16q12 are particularly associated with ER positive tumours[Stacey, etal., (2007), Nat Genet]. The present findings add further support to thenotion that ER positive and ER negative tumours have different geneticcomponents to their risks.

The 5p12 SNPs also showed associations with lower histological grade,which was explained by the association with ER status in multivariateanalysis. The FGFR2 SNP was more frequent in node positive than nodenegative tumours. There were no significant associations with tumourstage or histopathology (Table 10). No variant showed a significantassociation with age at diagnosis. The FGFR2 SNP was associated with afamily history of breast cancer, in line with previous reports [Huijts,et al., (2007), Breast Cancer Res, 9, R78; Easton, et al., (2007),Nature 447:1087-93]. Similar tendencies, though not statisticallysignificant, were observed for the 5p12 SNPs (Table 11).

Methods Patient and Control Selection:

Collection of blood samples and medical information from study subjectswas conducted with informed consent and ethical review board approval inaccordance with the Declaration of Helsinki.

Iceland:

Records of breast cancer diagnoses were obtained from the IcelandicCancer Registry (ICR). The ICR contains all cases of invasive breasttumours and ductal or lobular carcinoma in-situ diagnosed in Icelandfrom Jan. 1, 1955. All prevalent cases living in Iceland who had adiagnosis entered into the ICR up to the end of December 2006 wereeligible to participate in the study. The ICR contained records of 4785individuals diagnosed during this period. Consent, samples andsuccessful genotypes were obtained from approximately 2277 patients. Ofthese, genotypes were derived from Illumina Hap300 chips for 1791patients and from Centaurus assays for 486 patients. The 26,199Icelandic controls consisted of individuals selected from ongoingIllumina-based genome-wide association studies at deCODE. Individualswith a diagnosis of breast cancer in the ICR were excluded. Both maleand female genders were included. In the Icelandic controls (and theforeign replication control groups described below) there were nosignificant differences between genders in the frequencies of SNPslisted in Table 6. Therefore we considered that these control groupsprovided reasonable representations of the population frequencies of theSNPs under investigation.

Spain:

The Spanish study patients were recruited from the Oncology Departmentof Zaragoza Hospital between March 2006 and August 2007. Genotyping wascarried out satisfactorily on approximately 642 patients. The 1540successfully genotyped controls had attended the University Hospital inZaragoza for diseases other than cancer. Controls were questioned torule out prior cancers before drawing the blood sample. All patients andcontrols were of European ethnicity.

Sweden:

The Swedish sample sets consisted of Familial and Consecutive patientseries. The Familial breast cancer recruitment group consisted of 347breast cancer patients who had been referred to the oncogeneticcounselling clinic of the Karolinska University Hospital, Stockholm forinvestigation of a family history of breast cancer. Each patient camefrom a distinct family. All cases who met the current criteria for BRCAmutation screening had tested negative. The Consecutive breast cancerrecruitment group was comprised of 482 consecutively recruited patientswho were treated surgically for primary invasive breast cancer at theDepartments of Oncology at Huddinge and Söder Hospitals (covering thepopulation of southern Stockholm) from October 1998 to May 2000. Familyhistory was not taken into account in the selection of patients forrecruitment. Controls were 1302 blood donors and 448 cancer-freeindividuals of both genders. All controls were collected at theKarolinska University Hospital, Stockholm. There was no evidence ofsignificant heterogeneity between the Familial and Consecutive seriesfor any of the SNPs tested.

Holland:

Female patients diagnosed with breast cancer in the period 2005-2006were selected from the regional cancer registry held by theComprehensive Cancer Centre East in Nijmegen, the Netherlands. Thiscancer center keeps a population-based cancer registry and covers theeastern part of the Netherlands, a region with 1.3 million inhabitants.All patients diagnosed with breast cancer before the age of 70 wereinvited to participate in the study. The Comprehensive Cancer CentreEast collected the clinical and pathology data for all patients in thecancer registry. These standard cancer registry data were supplementedwith more detailed data by extraction from the medical files in thehospitals where the patients were treated. Controls were collected in asurvey in 2002-2003 by the Radboud University Nijmegen Medical Center.This survey, The Nijmegen Biomedical Study, was based on anage-stratified random sample of the population of Nijmegen. From thisgroup 2034 control individuals, age-matched by frequency to the patientpopulation, were selected and genotyped.

U.S. Multiethnic Cohort:

The Multiethnic Cohort study (MEC) consists of over 215,000 men andwomen in Hawaii and Los Angeles (with additional African-Americans fromelsewhere in California). The cohort is comprised predominantly ofAfrican Americans, Native Hawaiians, Japanese Americans, Latinos andEuropean Americans who entered the study between 1993 and 1996 bycompleting a 26-page self-administered questionnaire that asked detailedinformation about dietary habits, demographic factors, personalbehaviors, history of prior medical conditions, family history of commoncancers, and for women, reproductive history and exogenous hormone use.The participants were between the ages 45 and 75 at enrolment. Incidentcancers in the MEC are identified by cohort linkage to population-basedcancer Surveillance, Epidemiology and End Results (SEER) registriescovering Hawaii and Los Angeles County, and to the California Statecancer registry covering all of California. Beginning in 1994, bloodsamples were collected from incident breast cancer cases and a randomsample of MEC participants to serve as a control pool for geneticanalyses in the cohort. Eligible cases in the nested breast cancercase-control study consisted of women with incident invasive cancerdiagnosed after enrolment in the MEC through Dec. 31, 2002. Controlswere participants without breast cancer prior to entry into the cohortand without a diagnosis up to Dec. 31, 2002. Controls were frequencymatched to cases based on race/ethnicity and age (in 5-year intervals).

Nigeria:

We obtained genotypes from 689 incident breast cancer cases and 469controls from Ibadan, Nigeria. Cases were consecutively recruited atpresentation and later histologically confirmed in the Departments ofSurgery and Radiotherapy, University College Hospital, Ibadan, Nigeria.This hospital serves a catchment area of 3 million people and is anoncology referral centre for other hospitals in the region.Population-based controls were recruited randomly from the communityadjoining the hospital. After a community consultation process, controlsubjects were invited to attend a clinic set up for the purposes of thestudy. Controls were cancer-free at recruitment and over 18 years ofage.

Genotyping

Approximately 1791 Icelandic patients and 26199 controls were genotypedon Illumina Hap300 SNP arrays, as described previously[Stacey, et al.,(2007), Nat Genet 39:865-9]. All other genotyping was carried out usingNanongen Centaurus assays [Kutyavin, et al., (2006), Nucleic Acids Res,34, e128] that were generated for SNPs shown in Table 7. Primersequences are available on request. Centaurus SNP assays were validatedby genotyping the HapMap CEU samples and comparing the genotypes withpublished data. Assays were rejected if they showed ≧1.5% mismatcheswith the HapMap data. Approximately 10% of the Icelandic case sampleswere genotyped on both Illumina and Nanogen platforms and the observedmismatch rate was lower than 0.5%. All genotyping was carried out at thedeCODE Genetics facility. All physical coordinates are given accordingto NCBI Build 35.

Clinical Parameters

Estrogen and progesterone receptor status was derived fromimmunohistochemical or immunometric assay results reported in medicalrecords. A receptor level of ≧10 fmol/mg or an immunohistochemicalobservation of ≧10% positive nuclei was considered to be positive. Stagewas determined according to the American Joint Committee on Cancer,6^(th) Edition. Histological subtype was determined from SNOMED-M (orequivalent ICDO) codes as follows: “Invasive Ductal Carcinoma”: 8500/3,8521/3; “DCIS” (Ductal Carcinoma In-Situ and related in-situcarcinomas): 8500/2, 8050/2, 8201/2, 8501/2, 8503/2, 8507/2, 8522/2;“Invasive Lobular Carcinoma”: 8520/3; “LCIS” (Lobular CarcinomaIn-Situ): 8520/2; “Tubular or Mucinous”: 8211/3, 8480/3, 8481/3;“Medullary Carcinoma”: 8510/3, 8512/3; “Mixed Invasive”: 8522/3, 8523/3,8524/3, 8541/3, 8543/3; “Other Invasive”: 8050/3, 8141/3, 8200/3,8260/3, 8323/3, 8401/3, 8490/3, 8501/3, 8503/3, 8504/3, 8530/3, 8540/3.Tumours with the following non-specific codes were excluded fromanalysis of histopathological types: 8000/3, 8010/2, 8010/3, 8010/6,8020/3, 8140/2, 8140/3, 8230/3. Histological Grade was specifiedaccording to the Nottingham (Elston-Ellis modification of theScarff-Bloom-Richardson) system. Node status was analyzed for stages Ito IIIB and was based on pathological staging obtained by axillary lymphnode dissection and/or sentinel node biopsy. The Sweden Familal sampleset was not used in analysis of clinical parameters.

Statistical Analyses

We calculated the OR for each SNP allele assuming the multiplicativemodel; i.e. assuming that the relative risk of the two alleles that aperson carries multiplies. Allelic frequencies and OR are presented forthe markers. The associated P values were calculated with the standardlikelihood ratio χ² statistic as implemented in the NEMO softwarepackage (Gretarsdottir S., et al., Nat. Genet. 35: 131-38 (2003)).Confidence intervals were calculated assuming that the estimate of ORhas a log-normal distribution. For SNPs that were in strong LD, wheneverthe genotype of one SNP was missing for an individual, the genotype ofthe correlated SNPs were used to impute genotypes through a likelihoodapproach as previously described (Gretarsdottir S., et al., Nat. Genet.35: 131-38 (2003)). This ensured that results presented for differentSNPs were based on the same number of individuals, allowing meaningfulcomparisons of OR and P-values. Joint analyses of multiple case-controlreplication groups were carried out using a Mantel-Haenszel model inwhich the groups were allowed to have different population frequenciesfor alleles or genotypes but were assumed to have common relative risks.The tests of heterogeneity were performed by assuming that the allelefrequencies were the same in all groups under the null hypothesis, buteach group had a different allele frequency under the alternativehypothesis. Joint analyses of multiple groups of cases were performedusing an extended Mantel-Haenszel model that corresponds to a polytomouslogistic regression using the group indicator as a covariate. There wasno evidence of heterogeneity between the replication sample sets for anyof the SNPs tested. Association of risk variants with age at diagnosisand with histological grade were tested by linear regression between theparameter value and the number of copies of the risk allele carried byeach individual.

For analysis of family history we calculated for each genotype afamilial relative risk for first degree relatives by adapting ourpreviously described method (Amundadottir L. T., et al., PLoS Med. 1:e65(2004)) to accommodate genotype-specific familial relative risks(gfRR_(gt)). For each SNP genotype we determined a gfRR_(gt) as:

${gfRR}_{gt} = \frac{a/r}{x/n}$

where r is the number of first-degree relatives of breast cancerpatients with genotype gt (counting multiple times those individuals whoare related to more than one patient with genotype gt and a is thenumber of first-degree relatives of breast cancer patients with genotypegt who are themselves affected with breast cancer. In the denominator, nis the size of the population and x is the number of people in thepopulation affected with the disease (from ICR records). In order tocompare the observed gfRR_(gt) of one genotype with another, wecalculated the ratio gfRR_(gt1)/gfRR_(gt2). The significance of theselatter ratios was determined by simulation: Controls groups for eachgfRR_(gt) were drawn randomly from the set of breast cancer patientsgenotyped for the SNP in question. The control groups were the same sizeas each corresponding observed gfRR_(gt) group and were matched on thenumbers of parents listed in the Icelandic Genealogical Database (0, 1,or 2). Ten thousand iterations of control group gfRR_(gt1)/gfRR_(gt2)ratios were calculated and the P value determined by counting how oftenthe ratio for the control groups matched or exceed the observedgfRR_(gt1)/gfRR_(gt2).

We calculated genotype specific ORs, by estimating the genotypefrequencies in the population assuming Hardy-Weinberg equilibrium. Nosignificant deviations from multiplicity were observed. Potentialinteractions between loci were examined using correlation tests ofallele counts and by case-control association of carriers andnon-carriers. No significant interactions were observed.

Some of the Icelandic patients and controls are related to each other,both within and between groups, causing the χ² statistic to have amean >1. We estimated the inflation factor by simulating genotypesthrough the Icelandic genealogy, as described previously (Grant S. F.,et al., Nat Genet 38:320-3 (2006)), and corrected the χ² statistics forIcelandic OR's accordingly. The estimated inflation factor was 1.08 forthe complete set of Icelanders (cases and controls) and smaller, but ≧1,for all the other subsets used in the analysis of the clinicalphenotypes.

TABLE 6 Association of SNPs in 5p12 and 10q26 loci with risk for breastcancer Number Frequency Location SNP Allele Sample Set Cases ControlsCases Controls OR^(a) 95% CI P^(b) 5p12 rs4415084 T Iceland^(c) 227726199 0.409 0.372 1.17 (1.10, 1.25) 1.9 × 10⁻⁶ Sweden 833 1750 0.4430.417 1.11 (0.99, 1.25) 8.0 × 10⁻² Holland 744 2034 0.433 0.402 1.13(1.01, 1.28) 3.9 × 10⁻² Spain 642 1540 0.396 0.362 1.16 (1.01, 1.33) 3.3× 10⁻² MEC European Americans 532 567 0.471 0.424 1.21 (1.01, 1.43) 3.5× 10⁻⁶ Non−Icelanders^(d) 2751 5891 0.436 0.401 1.14 (1.07, 1.22) 7.5 ×10⁻⁵ All samples^(d) 5028 32090 0.431 0.396 1.16 (1.10, 1.21) 6.4 ×10⁻¹⁰ CGEMS^(e) 1141 1140 0.437 0.395 1.19 2.2 × 10⁻³ All ERPositive^(d) 2729 32090 0.444 0.396 1.23 (1.16, 1.30) 1.8 × 10⁻¹¹ All ERNegative^(d) 744 32090 0.391 0.396 0.98 (0.88, 1.10) 7.7 × 10⁻¹ All ERPositive vs Negative^(d) 2729 744 0.444 0.391 1.25 (1.11, 1.41) 2.0 ×10⁻⁴ 5p12 rs10941679 G Iceland^(c) 2277 26199 0.269 0.235 1.20 (1.11,1.29) 2.2 × 10⁻⁶ Sweden 833 1750 0.312 0.273 1.21 (1.06, 1.37) 3.8 ×10⁻³ Holland 744 2034 0.298 0.258 1.22 (1.07, 1.39) 3.2 × 10⁻³ Spain 6421540 0.214 0.198 1.10 (0.94, 1.30) 2.3 × 10⁻¹ MEC European Americans 532567 0.293 0.253 1.23 (1.02, 1.48) 3.4 × 10⁻² Non−Icelanders^(d) 27515891 0.279 0.245 1.19 (1.11, 1.28) 2.9 × 10⁻⁶ All samples^(d) 5028 320900.277 0.243 1.19 (1.13, 1.26) 2.9 × 10⁻¹¹ All ER Positive^(d) 2736 320900.288 0.243 1.27 (1.19, 1.35) 2.5 × 10⁻¹² All ER Negative^(d) 744 320900.254 0.243 1.05 (0.92, 1.18) 4.8 × 10⁻¹ All ER Positive vs Negative^(d)2736 744 0.288 0.254 1.21 (1.06, 1.38) 4.2 × 10⁻³ 5p12 rs7703618 G Allsamples^(d) 5028 32090 0.389 0.366 1.13 (1.08, 1.18) 3.3 × 10⁻⁷ 5p12rs10035564 G All samples^(d) 5028 32090 0.312 0.301 1.10 (1.04, 1.15)5.3 × 10⁻⁴ 5p12 rs4866929 A All samples^(d) 5028 32090 0.527 0.519 1.04(1.00, 1.09) 6.7 × 10⁻² 5p12 rs981782 T All samples^(d) 5028 32090 0.5070.500 1.04 (0.99, 1.09) 1.0 × 10⁻¹ 10q26 rs1219648 G Iceland^(c) 227026190 0.492 0.453 1.17 (1.10, 1.25) 1.2 × 10⁻⁶ Sweden 822 1725 0.4560.381 1.37 (1.21, 1.54) 3.0 × 10⁻⁷ Holland 741 2001 0.455 0.389 1.31(1.17, 1.48) 8.7 × 10⁻⁶ Spain 635 1493 0.477 0.424 1.24 (1.09, 1.41) 1.5× 10⁻³ Non−Icelanders^(d) 2198 5219 0.463 0.398 1.31 (1.22, 1.41) 1.2 ×10⁻¹³ All samples^(d) 4468 31409 0.470 0.412 1.23 (1.17, 1.29) 1.3 ×10⁻¹⁷ All ER Positive^(d) 2354 31409 0.481 0.412 1.29 (1.22, 1.38) 3.4 ×10⁻¹⁶ All ER Negative^(d) 657 31409 0.413 0.412 0.99 (0.88, 1.10) 8.3 ×10⁻¹ All ER Positive vs Negative^(d) 2354 657 0.481 0.413 1.30 (1.15,1.47) 2.9 × 10⁻⁵ ^(a)Allelic Odds Ratios calculated under themultiplicative model ^(b)All P values are two sided and have beenadjusted for relatedness and other potential stratification of theIcelandic cases and controls. ^(c)Icelandic data are combined Illuminaand Centaurus assay−derived replication data sets. ^(d)For analyses ofcombined data for the “Non−Icelanders”, “All Samples” and ER groups, theOR and P values were calculated using the Mantel−Haenszel method, andthe frequencies as simple (arithmetic) means of the frequencies ofindividual groups. ^(e)CGEMS data are displayed for comparative purposesonly and were not included in any of the calculations.

TABLE 7 Association with breast cancer for all variants tested in 5p12Number Frequency Sample Set Cases Controls Cases Controls OR 95% CI PAllele SNP Iceland 2277 26199 0.409 0.372 1.17 (1.10, 1.25) 1.9E−06 Trs4415084 Sweden 833 1750 0.443 0.417 1.11 (0.99, 1.25) 8.0E−02 Holland744 2034 0.433 0.402 1.13 (1.01, 1.28) 3.9E−02 Spain 642 1540 0.3960.362 1.16 (1.01, 1.33) 3.3E−02 MEC European 532 567 0.471 0.424 1.21(1.01, 1.43) 3.5E−02 American Non−Icelanders 2751 5891 0.436 0.401 1.14(1.07, 1.22) 7.5E−05 All European 5028 32090 0.431 0.396 1.16 (1.10,1.21) 6.4E−10 Ancestry MEC African 428 457 0.630 0.641 0.95 (0.78, 1.16)6.5E−01 American Nigeria 689 469 0.689 0.648 1.20 (1.00, 1.44) 4.6E−02Sample Set Cases Controls Cases Controls OR 95% CI P Allele SNP Iceland2277 26199 0.269 0.235 1.20 (1.11, 1.29) 2.2E−06 G rs10941679 Sweden 8331750 0.312 0.273 1.21 (1.06, 1.37) 3.8E−03 Holland 744 2034 0.298 0.2581.22 (1.07, 1.39) 3.2E−03 Spain 642 1540 0.214 0.198 1.10 (0.94, 1.30)2.3E−01 MEC European 532 567 0.293 0.253 1.23 (1.02, 1.48) 3.4E−02American Non−Icelanders 2751 5891 0.279 0.245 1.19 (1.11,1.28) 2.9E−06All European 5028 32090 0.277 0.243 1.19 (1.13, 1.26) 2.9E−11 AncestryMEC African 428 457 0.218 0.213 1.03 (0.82, 1.29) 8.0E−01 AmericanNigeria 689 469 0.175 _0.191 0.90 (0.72, 1.12) 3.3E−01 Sample Set CasesControls Cases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.3930.356 1.18 (1.10, 1.25) 9.8E−07 G rs7703618 Sweden 833 1750 0.405 0.3861.08 (0.96, 1.22) 1.9E−01 Holland 744 2034 0.398 0.383 1.06 (0.94, 1.20)3.3E−01 Spain 642 1540 0.324 0.313 1.05 (0.91, 1.21) 4.8E−01 MECEuropean 532 567 0.427 0.391 1.16 (0.98, 1.38) 9.2E−02 AmericanNon−Icelanders 2751 5891 0.388 0.368 1.08 (1.01, 1.16) 2.3E−02 AllEuropean 5028 32090 0.389 0.366 1.13 (1.08, 1.18) 3.3E−07 Ancestry MECAfrican 428 457 0.349 0.348 1.01 (0.83, 1.22) 9.4E−01 American Nigeria689 469 0.327 0.335 0.96 (0.81, 1.15) 6.9E−01 Sample Set Cases ControlsCases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.288 0.2611.14 (1.07, 1.23) 1.8E−04 G rs10035564 Sweden 833 1750 0.322 0.313 1.04(0.92, 1.18) 5.1E−01 Holland 744 2034 0.331 0.319 1.06 (0.93, 1.20)3.9E−01 Spain 642 1540 0.309 0.310 0.99 (0.86, 1.14) 9.1E−01Non−Icelanders 2751 5891 0.321 0.314 1.03 (0.96, 1.09) 4.0E−01 AllEuropean 5028 32090 0.312 0.301 1.10 (1.04, 1.15) 5.3E−04 AncestryNigeria 689 469 0.767 0.739 1.17 (0.95, 1.43) 1.3E−01 Iceland 2277 261990.484 0.468 1.07 (1.00, 1.14) 4.6E−02 A rs4866929 Sweden 833 1750 0.5150.514 1.01 (0.90, 1.13) 9.3E−01 Holland 744 2034 0.549 0.551 0.99 (0.88,1.12) 8.8E−01 Spain 642 1540 0.502 0.489 1.05 (0.93, 1.20) 4.2E−01 MECEuropean 532 567 0.585 0.572 1.05 (0.89, 1.25) 5.4E−01 AmericanNon−Icelanders 2751 5891 0.538 0.531 1.02 (0.96, 1.09) 5.6E−01 AllEuropean 5028 32090 0.527 0.519 1.04 (1.00, 1.09) 6.7E−02 Ancestry MECAfrican 428 457 0.884 0.893 0.92 (0.68, 1.24) 5.7E−01 American Nigeria689 469 0.999 0.999 NA NA NA Sample Set Cases Controls Cases Controls OR95% CI P Allele SNP Iceland 2277 26199 0.472 0.458 1.06 (0.99, 1.13)8.4E−02 T rs981782 Sweden 833 1750 0.512 0.509 1.01 (0.90, 1.14) 8.4E−01Holland 744 2034 0.544 0.543 1.00 (0.89, 1.13) 9.5E−01 Spain 642 15400.501 0.489 1.05 (0.92, 1.19) 5.0E−01 Non−Icelanders 2751 5891 0.5190.514 1.02 (0.95,1.10) 6.2E−01 All European 5028 32090 0.507 0.500 1.04(0.99, 1.09) 1.0E−01 Ancestry Nigeria 689 469 0.999 0.999 NA NA NASample Set Cases Controls Cases Controls OR 95% CI P Allele SNP Iceland2277 26199 0.618 0.591 1.12 (1.04, 1.21) 3.5E−03 T rs4613718 Sweden 8331750 0.616 0.594 1.10 (0.97, 1.24) 1.3E−01 Holland 744 2034 0.637 0.5931.20 11.06, 1.36) 3.0E−03 Spain 642 1540 0.635 0.609 1.12 (0.98, 1.28)9.8E−02 Non−Icelanders 2751 5891 0.629 0.599 1.16 (1.08, 1.24) 6.4E−05All European 5028 32090 0.627 0.597 1.14 (1.08, 1.20) 1.3E−06 AncestryNigeria 689 469 0.763 0.755 1.05 (0.86, 1.28) 6.4E−01 Iceland 2277 261990.402 0.367 1.16 (1.09, 1.24) 5.2E−06 G rs994793 Sweden 833 1750 0.4340.414 1.09 (0.97, 1.22) 1.7E−01 Holland 744 2034 0.438 0.411 1.12 (0.99,1.26) 7.2E−02 Spain 642 1540 0.374 0.352 1.10 (0.96, 1.26) 1.6E−01Non−Icelanders 2751 5891 0.415 0.392 1.11 (1.03, 1.19) 4.9E−03 AllEuropean 5028 32090 0.412 0.386 1.14 (1.09, 1.20) 6.0E−08 AncestryNigeria 689 469 0.623 0.611 1.05 (0.88, 1.25) 5.9E−01 Sample Set CasesControls Cases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.3980.363 1.16 (1.09, 1.24) 4.7E−06 T rs6867533 Sweden 833 1750 0.431 0.4121.08 (0.96, 1.22) 1.9E−01 Holland 744 2034 0.430 0.408 1.09 (0.97, 1.23)1.4E−01 Spain 642 1540 0.370 0.349 1.10 (0.96, 1.26) 1.8E−01Non−Icelanders 2751 5891 0.410 0.390 1.10 (1.02, 1.181 9.8E−03 AllEuropean 5028 32090 0.407 0.383 1.14 (1.08, 1.19) 1.1E−07 AncestryNigeria 689 469 0.545 0.528 1.07 (0.90, 1.27) 4.3E−01 Iceland 2277 261990.236 0.209 1.17 (1.08, 1.26) 5.7E−05 A rs7716600 Sweden 833 1750 0.2650.239 1.15 (1.01, 1.32) 3.8E−02 Holland 744 2034 0.254 0.235 1.11 (0.97,1.27) 1.4E−01 Spain 642 1540 0.177 0.177 1.00 (0.84, 1.19) 9.8E−01Non−Icelanders 12751 5891 0.232 0.217 1.11 (1.02, 1.20) 1.8E−02 AllEuropean 5028 32090 0.233 0.215 1.15 (1.08, 1.21) 1.8E−06 AncestryNigeria 689 469 0.169 0.194 0.85 (0.68, 1.06) 1.4E−01 Sample Set CasesControls Cases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.3120.283 1.15 (1.07, 1.24) 8.0E−05 A rs3935086 Sweden 833 1750 0.316 0.3111.02 (0.90, 1.16) 7.5E−01 Holland 744 2034 0.321 0.321 1.00 (0.88, 1.14)9.8E−01 Spain 642 1540 0.308 0.310 0.99 (0.86, 1.14) 9.0E−01 MECEuropean 532 567 0.380 0.349 1.15 (0.94, 1.40) 1.9E−01 AmericanNon−Icelanders 2751 5891 0.331 0.323 1.02 (0.95, 1.101 5.6E−01 AllEuropean 5028 32090 0.327 0.315 1.09 (1.03, 1.14) 1.3E−03 Ancestry CGEMSNA NA NA NA NA NA MEC African 428 457 0.706 0.702 1.02 (0.84, 1.24)8.3E−01 American Nigeria 689 469 0.831 0.799 1.24 (1.00, 1.54) 5.5E−02Sample Set Cases Controls Cases Controls OR 95% CI P Allele SNP Iceland2277 26199 0.151 0.131 1.17 (1.07, 1.28) 4.7E−04 G rs2067980 Sweden 8331750 0.165 0.148 1.14 (0.97, 1.341 1.2E−01 Holland 744 2034 0.159 0.1650.96 (0.82, 1.13) 6.0E−01 Spain 642 1540 0.126 0.139 0.89 (0.73, 1.08)2.3E−01 MEC European NA NA NA NA NA NA American Non−Icelanders 2751 58910.150 0.151 1.00 (0.93, 1.09) 9.5E−01 All European 5028 32090 0.1500.146 1.10 (1.03, 1.17) 6.3E−03 Ancestry Nigeria 689 469 0.085 0.0890.96 (0.71, 1.30) 7.7E−01 Sample Set Cases Controls Cases Controls OR95% CI P Allele SNP Iceland 2277 26199 0.199 0.182 1.12 (1.03, 1.22)8.2E−03 A rs7731099 Sweden 833 1750 0.188 0.196 0.95 (0.82, 1.10)5.0E−01 Holland 744 2034 0.200 0.198 1.02 (0.88, 1.18) 8.4E−01 Spain 6421540 0.189 0.186 1.02 (0.86, 1.20) 8.2E−01 MEC European NA NA NA NA NANA American Non−Icelanders 2751 5891 0.192 0.193 0.99 (0.90, 1.10)8.9E−01 All European 5028 32090 0.194 0.190 1.06 (1.00, 1.13) 5.9E−02Ancestry Nigeria 689 469 0.494 0.476 1.07 (0.90, 1.28) 4.1E−01 SampleSet Cases Controls Cases Controls OR 95% CI P Allele SNP Iceland 227726199 0.123 0.105 1.20 (1.08, 1.33) 6.3E−04 A rs13183434 Sweden 833 17500.130 0.116 1.14 (0.96, 1.37) 1.4E−01 Holland 744 2034 0.130 0.136 0.95(0.80, 1.13) 5.8E−01 Spain 642 1540 0.108 0.117 0.91 (0.74, 1.12)3.8E−01 Non−Icelanders 2751 5891 0.123 0.123 1.00 (0.85, 1.19) 9.6E−01All European 5028 32090 0.123 0.118 1.12 (1.04, 1.20) 2.7E−03 AncestryNigeria 689 469 0.072 0.072 1.00 (0.73, 1.38) 9.9E−01 Iceland 2277 261990.254 0.233 1.12 (1.04, 1.21) 2.1E−03 G rs10512875 Sweden 833 1750 0.2820.279 1.01 (0.89, 1.15) 8.4E−01 Holland 744 2034 0.297 0.283 1.07 (0.94,1.22) 3.0E−01 Spain 642 1540 0.277 0.280 0.99 (0.85, 1.14) 8.7E−01Non−Icelanders 2751 5891 0.286 0.281 1.02 (0.96, 1.09) 5.3E−01 AllEuropean 5028 32090 0.278 0.269 1.08 (1.02, 1.14) 5.2E−03 AncestryNigeria 689 469 0.569 0.536 1.15 (0.96, 1.36) 1.3E−01 Sample Set CasesControls Cases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.2850.262 1.12 (1.05, 1.21) 1.3E−03 G rs16902086 Sweden 833 1750 0.340 0.3251.07 (0.95, 1.21) 2.8E−01 Holland 744 2034 0.335 0.329 1.03 (0.91, 1.17)6.6E−01 ?+0 Spain 642 1540 0.312 0.311 1.01 (0.88, 1.16) 9.1E−01 MECEuropean 532 567 0.381 0.378 1.02 (0.85, 1.21) 8.7E−01 AmericanNon−Icelanders 2751 5891 0.342 0.335 1.03 (0.96, 1.11) 3.5E−01 AllEuropean 5028 32090 0.331 0.321 1.08 (1.02, 1.13) 3.5E−03 Ancestry MECAfrican 428 457 0.689 0.687 1.01 (0.83, 1.23) 9.3E−01 American Nigeria689 469 0.791 0.747 1.28 (1.05, 1.57) 1.4E−02 Iceland 2277 26199 0.1220.107 1.16 (1.04, 1.30) 9.9E−03 A rs6861150 Sweden 833 1750 0.123 0.1231.00 (0.83, 1.19) 9.7E−01 Holland 744 2034 0.122 0.121 1.01 (0.84, 1.21)9.1E−01 Spain 642 1540 0.110 0.119 0.91 (0.74, 1.12) 3.7E−01Non−Icelanders 2751 5891 0.118 0.121 0.98 (0.91, 1.07) 7.0E−01 AllEuropean 5028 32090 0.119 0.118 1.07 (0.98, 1.15) 1.2E−01 AncestryNigeria 689 469 0.070 0.068 1.02 (0.74, 1.40) 9.2E−01 Sample Set CasesControls Cases Controls OR 95% CI P Allele SNP Iceland 2277 26199 0.1270.111 1.17 (1.04, 1.30) 6.7E−03 C rs6451795 Sweden 833 1750 0.133 0.1321.01 (0.85, 1.20) 9.1E−01 Holland 744 2034 0.130 0.130 1.00 (0.84, 1.20)9.8E−01 Spain 642 1540 0.116 0.129 0.89 (0.73, 1.09) 2.6E−01Non−Icelanders 2751 5891 0.127 0.130 0.97 (0.87, 1.08) 5.8E−01 AllEuropean 5028 32090 0.127 0.125 1.06 (0.99, 1.15) 1.1E−01 AncestryNigeria 689 469 0.474 0.463 1.05 (0.88, 1.25) 6.0E−01 Iceland 2277 261990.497 0.475 1.09 (1.02, 1.16) 6.7E−03 T rs11743392 Sweden 833 1750 0.4990.513 0.95 (0.84, 1.06) 3.6E−01 Holland 744 2034 0.542 0.554 0.95 (0.85,1.07) 4.2E−01 Spain 642 1540 0.498 0.485 1.05 (0.93, 1.20) 4.2E−01Non−Icelanders 2751 5891 0.513 0.517 0.98 (0.91, 1.05) 5.6E−01 AllEuropean 5028 32090 0.509 0.507 1.04 (0.99, 1.09) 9.5E−02 AncestryNigeria 689 469 0.999 0.999 NA NA NA Sample Set Cases Controls CasesControls OR 95% CI P Allele SNP Iceland 2277 26199 0.408 0.372 1.17(1.09, 1.25) 1.8E−05 T rs7718785 Sweden 833 1750 0.382 0.406 0.90 (0.80,1.02) 9.9E−02 Holland 744 2034 0.424 0.438 0.94 (0.84, 1.06) 3.5E−01Spain 642 1540 0.408 0.400 1.03 (0.90, 1.18) 6.5E−01 MEC European 532567 0.472 0.463 1.04 (0.86, 1.26) 7.0E−01 American Non−Icelanders 27515891 0.422 0.427 0.96 (0.90, 1.03) 2.8E−01 All European 5028 32090 0.4190.416 1.05 (1.00, 1.11) 3.1E−02 Ancestry MEC African 428 457 0.659 0.6750.93 (0.70, 1.23) 6.1E−01 American Nigeria 689 469 0.710 0.663 1.24(1.03, 1.49) 2.0E−02 Sample Set Cases Controls Cases Controls OR 95% CIP Allele SNP Iceland 2277 26199 0.505 0.530 0.90 (0.85, 0.97) 2.9E−03 Grs13179818 Sweden 833 1750 0.507 0.492 1.06 (0.95, 1.20) 3.0E−01 Holland744 2034 0.435 0.436 1.00 (0.88, 1.12) 9.4E−01 Spain 642 1540 0.4950.537 0.84 (0.74, 0.96) 1.1E−02 Non−Icelanders 2751 5891 0.479 0.4880.97 (0.90, 1.04) 4.3E−01 All European 5028 32090 0.485 0.499 0.93(0.89, 0.98) 6.0E−03 Ancestry Nigeria 689 469 0.001 0.014 0.07 (0.02,0.28) 2.1E−04 Allelic Odds Ratios calculated under the multiplicativemodel. All P values are two sided and have been adjusted for relatednessand other potential stratification of the Icelandic cases and controls.Icelandic data are combined Illumina and Centaurus assay−derivedreplication data sets. For analyses of combined data for the“Non−Icelanders” and “All European Ancestry” the OR and P values werecalculated using the Mantel−Haenszel method, and the frequencies assimple (arithmetic) means of the frequencies of individual groups. GEMSdata are displayed for comparative purposes only and were not includedin any of the calculations.

TABLE 8 LD relations between 5p12 SNPs in Iceland rs981782 rs4866929rs7703618 rs10035564 rs4415084 rs10941679 rs981782 NA 0.94 0.10 0.380.06 0.01 r2 rs4866929 0.99 NA 0.11 0.37 0.07 0.02 rs7703618 0.39 0.43NA 0.46 0.81 0.45 rs10035564 0.95 0.96 0.85 NA 0.37 0.13 rs4415084 0.290.32 0.94 0.79 NA 0.51 rs10941679 0.18 0.24 0.90 0.39 0.99 NA D

TABLE 9 Multivariate analysis of SNPs in 5p12 Adjusted for rs4415084rs10941679 rs7703618 rs10035564 rs4866929 rs981782 Tested Variant P OR POR P OR P OR P OR P OR rs4415084 NA NA 4.2E−02 1.07 3.8E−04 1.19 2.8E−071.16 3.6E−09 1.16 2.6E−09 1.16 rs10941679 1.7E−03 1.13 NA NA 1.8E−051.17 1.3E−08 1.18 1.4E−10 1.19 1.0E−10 1.19 rs7703618 4.9E−01 0.975.2E−01 1.02 NA NA 1.8E−04 1.13 1.9E−06 1.13 1.4E−06 1.13 rs100355646.9E−01 0.99 4.6E−01 1.02 8.9E−01 0.99 NA NA 3.6E−03 1.10 1.9E−03 1.11rs4866929 9.8E−01 1.00 6.2E−01 1.01 9.2E−01 1.00 9.5E−01 0.99 NA NA4.0E−01 1.06 rs981782 9.7E−01 1.00 6.3E−01 1.01 8.8E−01 1.00 5.6E−010.98 7.6E−01 0.98 NA NA

TABLE 10 Clinical Correlations Number Frequency Sample/Comparison CasesControls Cases Controls OR 95% CI P Allele SNP Estrogen Receptor test:ER positive vs control Iceland 1129 26199 0.428 0.372 1.26 (1.16, 1.38)8.9E−08 4 rs4415084 Sweden 377 1750 0.443 0.417 1.11 (0.95, 1.30)1.9E−01 4 rs4415084 Holland 541 2034 0.441 0.403 1.17 (1.02, 1.34)2.4E−02 4 rs4415084 Spain 320 1540 0.420 0.362 1.28 (1.07, 1.52) 6.4E−034 rs4415084 MEC European American 362 567 0.486 0.424 1.28 (1.06, 1.56)1.1E−02 4 rs4415084 All European Ancestry 2729 32090 0.444 0.396 1.23(1.16, 1.30) 1.8E−11 4 rs4415084 test: ER negative vs control Iceland361 26199 0.373 0.372 1.00 (0.00, inf) 1.0E+00 4 rs4415084 Sweden 771750 0.384 0.417 0.87 (0.62, 1.21) 4.0E−01 4 rs4415084 Holland 125 20340.388 0.402 0.94 (0.72, 1.22) 6.4E−01 4 rs4415084 Spain 98 1540 0.3430.361 0.92 (0.68, 1.25) 5.9E−01 4 rs4415084 MEC European American 83 5670.470 0.424 1.20 (0.86, 1.68) 2.9E−01 4 rs4415084 All European Ancestry744 32090 0.391 0.396 0.98 (0.88, 1.10) 7.7E−01 4 rs4415084 test: ERpositive vs negative Iceland 1129 361 0.428 0.373 1.26 (1.06, 1.50)8.2E−03 4 rs4415084 Sweden 377 77 0.443 0.384 1.28 (0.90, 1.82) 1.7E−014 rs4415084 Holland 541 125 0.441 0.388 1.25 (0.94, 1.65) 1.3E−01 4rs4415084 Spain 320 98 0.420 0.343 1.39 (1.00, 1.94) 5.2E−02 4 rs4415084MEC European American 362 83 0.486 0.470 1.07 (0.75, 1.51) 7.2E−01 4rs4415084 All European Ancestry 2729 744 0.444 0.391 1.25 (1.11, 1.41)2.0E−04 4 rs4415084 test: ER positive vs control Iceland 1134 261990.284 0.236 1.29 (1.17, 1.42) 3.1E−07 3 rs10941679 Sweden 377 1750 0.3070.273 1.18 (0.99, 1.40) 6.4E−02 3 rs10941679 Holland 541 2034 0.3040.258 1.26 (1.09, 1.46) 2.4E−03 3 rs10941679 Spain 320 1540 0.244 0.1971.31 (1.07, 1.61) 9.5E−03 3 rs10941679 MEC European American 364 5670.302 0.253 1.28 (1.04, 1.58) 2.0E−02 3 rs10941679 All European Ancestry2736 32090 0.288 0.243 1.27 (1.19, 1.35) 2.5E−12 3 rs10941679 test: ERnegative vs control Iceland 361 26199 0.242 0.236 1.03 (0.86, 1.24)7.2E−01 3 rs10941679 Sweden 77 1750 0.266 0.273 0.97 (0.67, 1.39)8.5E−01 3 rs10941679 Holland 125 2034 0.267 0.258 1.05 (0.78, 1.40)7.5E−01 3 rs10941679 Spain 98 1540 0.184 0.197 0.91 (0.63, 1.32) 6.3E−013 rs10941679 MEC European American 83 567 0.313 0.253 1.35 (0.94, 1.93)1.0E−01 3 rs10941679 All European Ancestry 744 32090 0.254 0.243 1.05(0.92, 1.18) 4.8E−01 3 rs10941679 test: ER positive vs negative Iceland1134 361 0.284 0.242 1.25 (1.03, 1.51) 2.5E−02 3 rs10941679 Sweden 37777 0.307 0.266 1.22 (0.83, 1.79) 3.1E−01 3 rs10941679 Holland 541 1250.304 0.267 1.20 (0.89, 1.64) 2.4E−01 3 rs10941679 Spain 320 98 0.2440.184 1.43 (0.96, 2.13) 7.5E−02 3 rs10941679 MEC European American 36483 0.302 0.313 0.95 (0.66, 1.37) 7.7E−01 3 rs10941679 All EuropeanAncestry 2736 744 0.288 0.254 1.21 (1.06, 1.38) 4.2E−03 3 rs10941679test: ER positive vs control Iceland 1126 26190 0.504 0.453 1.23 (1.13,1.34) 1.6E−06 3 rs1219648 Sweden 372 1725 0.466 0.381 1.42 (1.21, 1.67)1.6E−05 3 rs1219648 Holland 539 2001 0.468 0.389 1.39 (1.21, 1.59)2.5E−06 3 rs1219648 Spain 317 1493 0.484 0.424 1.27 (1.07, 1.51) 5.8E−033 rs1219648 MEC European American NA NA NA NA NA (0.00, 0.00) NA 3rs1219648 All European Ancestry 2354 31409 0.481 0.412 1.29 (1.22, 1.38)3.4E−16 3 rs1219648 test: ER negative vs control Iceland 360 26190 0.4400.453 0.95 (0.82, 1.10) 5.0E−01 3 rs1219648 Sweden 76 1725 0.349 0.3810.87 (0.62, 1.22) 4.3E−01 3 rs1219648 Holland 124 2001 0.399 0.389 1.04(0.80, 1.36) 7.5E−01 3 rs1219648 Spain 97 1493 0.464 0.424 1.17 (0.88,1.57) 2.8E−01 3 rs1219648 MEC European American NA NA NA NA NA (0.00,0.00) NA 3 rs1219648 All European Ancestry 657 31409 0.413 0.412 0.99(0.88, 1.10) 8.3E−01 3 rs1219648 test: ER positive vs negative Iceland1126 360 0.504 0.440 1.29 (1.09, 1.53) 2.7E−03 3 rs1219648 Sweden 372 760.466 0.349 1.63 (1.14, 2.34) 7.3E−03 3 rs1219648 Holland 539 124 0.4680.399 1.33 (1.00, 1.75) 4.7E−02 3 rs1219648 Spain 317 97 0.484 0.4641.08 (0.79, 1.50) 6.2E−01 3 rs1219648 MEC European American NA NA NA NANA (0.00, 0.00) NA 3 rs1219648 All European Ancestry 2354 657 0.4810.413 1.30 (1.15, 1.47) 2.9E−05 3 rs1219648 Progesterone Receptor test:PR positive vs control Iceland 1049 26199 0.422 0.372 1.23 (1.13, 1.35)3.8E−06 4 rs4415084 Sweden 300 1750 0.445 0.417 1.12 (0.94, 1.34)2.0E−01 4 rs4415084 Holland 404 2034 0.442 0.403 1.17 (1.01, 1.37)4.0E−02 4 rs4415084 Spain 269 1540 0.424 0.361 1.30 (1.08, 1.57) 6.0E−034 rs4415084 MEC European American 294 567 0.490 0.424 1.30 (1.06, 1.60)1.2E−02 4 rs4415084 All European Ancestry 2316 32090 0.445 0.396 1.22(1.15, 1.30) 7.3E−10 4 rs4415084 test: PR negative vs control Iceland424 26199 0.393 0.372 1.09 (0.95, 1.25) 2.2E−01 4 rs4415084 Sweden 981750 0.393 0.417 0.90 (0.67, 1.21) 5.0E−01 4 rs4415084 Holland 260 20340.415 0.402 1.05 (0.88, 1.27) 5.8E−01 4 rs4415084 Spain 144 1540 0.3520.362 0.96 (0.74, 1.23) 7.4E−01 4 rs4415084 MEC European American 126567 0.468 0.424 1.19 (0.89, 1.59) 2.3E−01 4 rs4415084 All EuropeanAncestry 1052 32090 0.404 0.396 1.05 (0.96, 1.15) 2.7E−01 4 rs4415084test: PR positive vs negative Iceland 1049 424 0.422 0.393 1.13 (0.96,1.33) 1.4E−01 4 rs4415084 Sweden 300 98 0.445 0.393 1.24 (0.89, 1.72)2.0E−01 4 rs4415084 Holland 404 260 0.442 0.415 1.11 (0.89, 1.39)3.4E−01 4 rs4415084 Spain 269 144 0.424 0.351 1.36 (1.01, 1.84) 4.2E−024 rs4415084 MEC European American 294 126 0.489 0.467 1.09 (0.80, 1.49)5.8E−01 4 rs4415084 All European Ancestry 2316 1052 0.445 0.404 1.16(1.04, 1.29) 6.2E−03 4 rs4415084 test: PR positive vs control Iceland1054 26199 0.284 0.235 1.29 (1.17, 1.43) 5.8E−07 3 rs10941679 Sweden 3001750 0.307 0.273 1.18 (0.98, 1.43) 8.9E−02 3 rs10941679 Holland 404 20340.299 0.258 1.23 (1.04, 1.46) 1.5E−02 3 rs10941679 Spain 269 1540 0.2400.197 1.28 (1.03, 1.60) 2.7E−02 3 rs10941679 MEC European American 296567 0.307 0.253 1.31 (1.05, 1.64) 1.7E−02 3 rs10941679 All EuropeanAncestry 2323 32090 0.288 0.243 1.27 (1.18, 1.36) 7.2E−11 3 rs10941679test: PR negative vs control Iceland 424 26199 0.243 0.236 1.04 (0.88,1.22) 6.3E−01 3 rs10941679 Sweden 98 1750 0.265 0.273 0.96 (0.69, 1.33)8.1E−01 3 rs10941679 Holland 260 2034 0.294 0.258 1.20 (0.98, 1.47)8.2E−02 3 rs10941679 Spain 144 1540 0.208 0.197 1.07 (0.79, 1.44)6.6E−01 3 rs10941679 MEC European American 126 567 0.313 0.253 1.35(1.00, 1.83) 5.0E−02 3 rs10941679 All European Ancestry 1052 32090 0.2650.243 1.11 (1.00, 1.23) 5.4E−02 3 rs10941679 test: PR positive vsnegative Iceland 1054 424 0.284 0.243 1.24 (1.03, 1.48) 2.3E−02 3rs10941679 Sweden 300 98 0.307 0.265 1.23 (0.86, 1.76) 2.6E−01 3rs10941679 Holland 404 260 0.300 0.294 1.03 (0.81, 1.31) 8.1E−01 3rs10941679 Spain 269 144 0.240 0.208 1.20 (0.85, 1.69) 3.0E−01 3rs10941679 MEC European American 296 126 0.307 0.313 0.97 (0.71, 1.33)8.5E−01 3 rs10941679 All European Ancestry 2323 1052 0.288 0.265 1.14(1.02, 1.28) 2.7E−02 3 rs10941679 test: PR positive vs control Iceland1047 26190 0.492 0.453 1.17 (1.07, 1.28) 4.6E−04 3 rs1219648 Sweden 2951725 0.456 0.381 1.36 (1.14, 1.63) 5.8E−04 3 rs1219648 Holland 403 20010.457 0.389 1.32 (1.13, 1.54) 3.7E−04 3 rs1219648 Spain 266 1493 0.4770.424 1.24 (1.03, 1.49) 2.3E−02 3 rs1219648 MEC European American NA NANA NA NA (0.00, 0.00) NA 3 rs1219648 All European Ancestry 2011 314090.470 0.412 1.23 (1.15, 1.32) 7.1E−10 3 rs1219648 test: PR negative vscontrol Iceland 423 26190 0.470 0.453 1.07 (0.94, 1.23) 3.1E−01 3rs1219648 Sweden 97 1725 0.407 0.381 1.12 (0.83, 1.50) 4.6E−01 3rs1219648 Holland 258 2001 0.448 0.389 1.27 (1.06, 1.53) 1.0E−02 3rs1219648 Spain 143 1493 0.490 0.424 1.30 (1.02, 1.66) 3.4E−02 3rs1219648 MEC European American NA NA NA NA NA (0.00, 0.00) NA 3rs1219648 All European Ancestry 921 31409 0.454 0.412 1.16 (1.06, 1.28)2.1E−03 3 rs1219648 test: PR positive vs negative Iceland 1047 423 0.4920.470 1.09 (0.93, 1.28) 2.9E−01 3 rs1219648 Sweden 295 97 0.456 0.4071.22 (0.88, 1.69) 2.4E−01 3 rs1219648 Holland 403 258 0.457 0.448 1.04(0.83, 1.29) 7.5E−01 3 rs1219648 Spain 266 143 0.477 0.490 0.95 (0.71,1.27) 7.4E−01 3 rs1219648 MEC European American NA NA NA NA NA (0.00,0.00) NA 3 rs1219648 All European Ancestry 2011 921 0.470 0.454 1.07(0.96, 1.19) 2.5E−01 3 rs1219648 Histopathology: test: Invasive Ductalvs Control All European Ancestry 2897 32090 0.431 0.396 1.17 (1.10,1.24) 1.6E−07 4 rs4415084 All European Ancestry 2899 32090 0.276 0.2431.18 (1.11, 1.26) 4.7E−07 3 rs10941679 All European Ancestry 2512 314090.465 0.412 1.21 (1.14, 1.29) 2.8E−10 3 rs1219648 test: Invasive Lobularvs Control All European Ancestry 419 32090 0.422 0.396 1.07 (0.93, 1.23)3.4E−01 4 rs4415084 All European Ancestry 419 32090 0.264 0.243 1.13(0.97, 1.33) 1.2E−01 3 rs10941679 All European Ancestry 363 31409 0.5190.412 1.38 (1.19, 1.60) 2.2E−05 3 rs1219648 test: Tubular vs Control AllEuropean Ancestry 187 32090 0.444 0.396 1.20 (0.98, 1.48) 7.8E−02 4rs4415084 All European Ancestry 187 32090 0.321 0.243 1.22 (0.97, 1.53)9.6E−02 3 rs10941679 All European Ancestry 149 31409 0.445 0.412 1.18(0.94, 1.49) 1.5E−01 3 rs1219648 test: Other Invasive vs Control AllEuropean Ancestry 75 30340 0.367 0.390 0.89 (0.64, 1.24) 4.9E−01 4rs4415084 All European Ancestry 75 30340 0.241 0.236 1.01 (0.00, inf)1.0E+00 3 rs10941679 All European Ancestry 58 29684 0.458 0.422 1.24(0.86, 1.79) 2.5E−01 3 rs1219648 test: Mixed Invasive vs Control AllEuropean Ancestry 192 30550 0.461 0.404 1.31 (1.07, 1.61) 8.5E−03 4rs4415084 All European Ancestry 192 30550 0.323 0.255 1.46 (1.17, 1.83)8.0E−04 3 rs10941679 All European Ancestry 147 29916 0.486 0.407 1.35(1.07, 1.71) 1.1E−02 3 rs1219648 test: Medullary vs Control All EuropeanAncestry 43 30340 0.425 0.390 1.15 (0.74, 1.77) 5.4E−01 4 rs4415084 AllEuropean Ancestry 43 30340 0.313 0.236 1.13 (0.68, 1.87) 6.3E−01 3rs10941679 All European Ancestry 42 29684 0.424 0.422 0.99 (0.64, 1.53)9.7E−01 3 rs1219648 test: DCIS vs Control All European Ancestry 27530340 0.468 0.390 1.25 (1.05, 1.49) 1.1E−02 4 rs4415084 All EuropeanAncestry 275 30340 0.268 0.236 1.31 (1.09, 1.59) 5.1E−03 3 rs10941679All European Ancestry 272 29684 0.360 0.422 1.05 (0.88, 1.25) 5.9E−01 3rs1219648 test: LCIS vs Control All European Ancestry 28 29773 0.2390.379 0.72 (0.41, 1.27) 2.6E−01 4 rs4415084 All European Ancestry 2829773 0.169 0.230 0.90 (0.46, 1.78) 7.7E−01 3 rs10941679 All EuropeanAncestry 28 29684 0.492 0.422 1.18 (0.69, 2.00) 5.4E−01 3 rs1219648test: Other Non-invasive vs Control All European Ancestry 12 28233 0.4290.387 1.15 (0.50, 2.65) 7.5E−01 4 rs4415084 All European Ancestry 1228233 0.343 0.247 1.55 (0.64, 3.77) 3.4E−01 3 rs10941679 All EuropeanAncestry 12 28191 0.493 0.421 1.17 (0.51, 2.64) 7.1E−01 3 rs1219648test: Heterogeneity, All Types All European Ancestry NA NA NA NA NA(0.00, 0.00) 1.9E−01 4 rs4415084 All European Ancestry NA NA NA NA NA(0.00, 0.00) 5.8E−01 3 rs10941679 All European Ancestry NA NA NA NA NA(0.00, 0.00) 5.8E−01 3 rs1219648 test: Heterogeneity, Invasive Types AllEuropean Ancestry NA NA NA NA NA (0.00, 0.00) 4.4E−01 4 rs4415084 AllEuropean Ancestry NA NA NA NA NA (0.00, 0.00) 5.1E−01 3 rs10941679 AllEuropean Ancestry NA NA NA NA NA (0.00, 0.00) 6.1E−01 3 rs1219648 Stagetest: Stage 0 (in-situ) vs Control All European Ancestry 267 29773 0.3910.379 1.21 (1.02, 1.45) 2.9E−02 4 rs4415084 All European Ancestry 26729773 0.294 0.230 1.27 (1.05, 1.55) 1.5E−02 3 rs10941679 All EuropeanAncestry 265 29684 0.399 0.422 1.02 (0.85, 1.21) 8.5E−01 3 rs1219648test: Stage1 vs Control All European Ancestry 1394 31523 0.412 0.3881.19 (1.10, 1.29) 2.1E−05 4 rs4415084 All European Ancestry 1394 315230.273 0.241 1.19 (1.09, 1.30) 1.5E−04 3 rs10941679 All European Ancestry1385 31409 0.473 0.412 1.23 (1.13, 1.33) 4.1E−07 3 rs1219648 test:Stage2 vs Control All European Ancestry 1161 31523 0.408 0.388 1.10(1.01, 1.21) 2.5E−02 4 rs4415084 All European Ancestry 1161 31523 0.2720.241 1.22 (1.11, 1.35) 4.9E−05 3 rs10941679 All European Ancestry 115631409 0.468 0.412 1.22 (1.12, 1.33) 5.3E−06 3 rs1219648 test: Stage3&4vs Control All European Ancestry 438 31523 0.424 0.388 1.12 (0.97, 1.29)1.1E−01 4 rs4415084 All European Ancestry 438 31523 0.273 0.241 1.14(0.97, 1.33) 1.1E−01 3 rs10941679 All European Ancestry 435 31409 0.4860.412 1.31 (1.14, 1.50) 1.1E−04 3 rs1219648 test: Heterogeneity, Stages1-4 All European Ancestry NA NA NA NA NA (0.00, 0.00) 3.9E−01 4rs4415084 All European Ancestry NA NA NA NA NA (0.00, 0.00) 6.8E−01 3rs10941679 All European Ancestry NA NA NA NA NA (0.00, 0.00) 6.3E−01 3rs1219648 test: All Invasvie Stages (1-4) vs Control All EuropeanAncestry 3233 31523 0.416 0.388 1.15 (1.09, 1.22) 4.6E−07 4 rs4415084All European Ancestry 3233 31523 0.271 0.241 1.19 (1.12, 1.27) 1.6E−08 3rs10941679 All European Ancestry 3216 31409 0.472 0.412 1.24 (1.17,1.31) 7.6E−15 3 rs1219648 test: In-situ (Stage 0) vs Invasive (Stage1-4) All European Ancestry 267 2749 0.391 0.410 1.04 (0.86, 1.24)7.0E−01 4 rs4415084 All European Ancestry 267 2749 0.294 0.260 1.05(0.86, 1.29) 6.0E−01 3 rs10941679 All European Ancestry 265 2739 0.3990.480 0.84 (0.70, 1.00) 5.2E−02 3 rs1219648 Grade test: Grade1 vsControl All European Ancestry 471 31523 0.443 0.388 1.26 (1.10, 1.44)6.6E−04 4 rs4415084 All European Ancestry 471 31523 0.295 0.241 1.25(1.08, 1.45) 2.5E−03 3 rs10941679 All European Ancestry 467 31409 0.4790.412 1.21 (1.06, 1.39) 4.4E−03 3 rs1219648 test: Grade2 vs Control AllEuropean Ancestry 985 31523 0.428 0.388 1.20 (1.09, 1.31) 1.8E−04 4rs4415084 All European Ancestry 985 31523 0.287 0.241 1.27 (1.15, 1.41)5.5E−06 3 rs10941679 All European Ancestry 981 31409 0.476 0.412 1.31(1.19, 1.43) 1.8E−08 3 rs1219648 test: Grade3 vs Control All EuropeanAncestry 690 31523 0.402 0.388 1.05 (0.94, 1.17) 4.2E−01 4 rs4415084 AllEuropean Ancestry 690 31523 0.251 0.241 1.05 (0.92, 1.19) 4.7E−01 3rs10941679 All European Ancestry 683 31409 0.447 0.412 1.13 (1.01, 1.26)2.8E−02 3 rs1219648 test: Trend Test Grade All European Ancestry NA NANA NA NA (0.00, 0.00) 1.8E−02 4 rs4415084 All European Ancestry NA NA NANA NA (0.00, 0.00) 2.0E−02 3 rs10941679 All European Ancestry NA NA NANA NA (0.00, 0.00) 2.9E−01 3 rs1219648 Node Status test: Node positivevs control All European Ancestry 1120 31523 0.407 0.388 1.10 (1.01,1.21) 2.6E−02 4 rs4415084 All European Ancestry 1122 31523 0.264 0.2411.16 (1.05, 1.28) 3.2E−03 3 rs10941679 All European Ancestry 1113 314090.484 0.412 1.32 (1.21, 1.44) 2.0E−10 3 rs1219648 test: Node negative vscontrol All European Ancestry 1883 31523 0.421 0.388 1.18 (1.10, 1.26)4.3E−06 4 rs4415084 All European Ancestry 1886 31523 0.276 0.241 1.22(1.12, 1.31) 7.4E−07 3 rs10941679 All European Ancestry 1873 31409 0.4700.412 1.20 (1.12, 1.28) 3.0E−07 3 rs1219648 test: Node positive vsnegative All European Ancestry 1120 1883 0.406 0.421 0.94 (0.84, 1.05)2.6E−01 4 rs4415084 All European Ancestry 1122 1886 0.264 0.277 0.97(0.86, 1.09) 5.6E−01 3 rs10941679 All European Ancestry 1113 1873 0.4840.470 1.11 (1.00, 1.24) 4.7E−02 3 rs1219648

TABLE 11 1° Familial Relative Risks by SNP Genotype # # of Affected # ofAffected Affected 1° # 1° with Relatives Affected Relatives Geno Genofor Geno Geno with for gfRRgt1/ SNP Location type 1 type 1 gfRRgt1 type1 type 2 Genotype 2 gfRRgt 2 Genotype 2 gfRRgt2 P-value rs4415084 5p12C/T 1089 1.758 317 C/C 781 1.632 202 1.077 0.0822 T/T 373 1.932 111 C/C781 1.632 202 1.184 0.0511 T/T 373 1.932 111 C/T 1089 1.758 317 1.0990.2410 rs10941679 5p12 A/G 884 1.832 262 A/A 1152 1.694 311 1.081 0.1314G/G 148 2.192 50 A/A 1152 1.694 311 1.294 0.0599 G/G 148 2.192 50 A/G884 1.832 262 1.197 0.1581 rs1219648 10q26 A/G 1107 1.709 301 A/A 6001.532 152 1.115 0.1152 G/G 563 2.063 186 A/A 600 1.532 152 1.346 0.0019G/G 563 2.063 186 A/G 1107 1.709 301 1.207 0.0076

TABLE 12 Surrogate markers for marker rs4415084. Markers with values ofr² greater than 0.2 to rs4415084 in the HapMap CEU dataset(http://www.hapmap.org) in a 1 Mb interval flanking the marker wereselected. Shown is the name of the correlated SNP, values for r² and D′to rs4415084, and the corresponding P-value, as well as the position ofthe surrogate marker in NCBI Build 36 and a reference to the sequence idcontaining flanking sequnces for the marker. Anchor SNP Corr SNP r² D′P-value Pos in Bld 36 SEQ ID NO: rs4415084 rs4866900 0.92417 0.2366753.02E−08 44480857 1 rs4415084 rs7712213 1 0.20783 4.15E−09 44487026 2rs4415084 rs1482690 1 0.201693 5.32E−09 44524597 3 rs4415084 rs1482663 10.207207 3.23E−09 44578859 4 rs4415084 rs1351633 1 0.207207 3.23E−0944579608 5 rs4415084 rs983940 1 0.207207 3.23E−09 44579893 6 rs4415084rs4866905 1 0.207207 3.23E−09 44591624 7 rs4415084 rs10079222 1 0.2072073.23E−09 44597230 8 rs4415084 rs4463187 0.738181 0.259028 3.59E−0844604412 9 rs4415084 rs10054521 0.736037 0.254284 5.31E−08 44611928 10rs4415084 rs10059745 0.741785 0.267307 2.28E−08 44622995 11 rs4415084rs6862655 0.741785 0.267307 2.28E−08 44626667 12 rs4415084 rs46392380.741785 0.267307 2.28E−08 44627752 13 rs4415084 rs10066953 0.7381810.259028 3.59E−08 44636753 14 rs4415084 rs12374507 0.738181 0.2590283.59E−08 44640070 15 rs4415084 rs4573006 0.738181 0.259028 3.59E−0844647407 16 rs4415084 rs4529201 0.738181 0.259028 3.59E−08 44649728 17rs4415084 rs6866354 0.738181 0.259028 3.59E−08 44662567 18 rs4415084rs4463188 1 1 7.12E−36 44678427 19 rs4415084 rs4321755 1 1 1.28E−3644681952 20 rs4415084 rs4492118 1 1 2.08E−36 44682382 21 rs4415084rs4613718 1 0.459459 2.24E−17 44685701 22 rs4415084 rs7735881 1 11.28E−36 44685933 23 rs4415084 rs7723539 1 1 1.28E−36 44695967 24rs4415084 rs10805685 1 1 2.08E−36 44697715 25 rs4415084 rs10941677 1 15.45E−36 44698156 26 rs4415084 rs4415084 1 1 — 44698272 235 rs4415084rs4415085 1 1 1.28E−36 44698716 27 rs4415084 rs7720551 1 1 1.28E−3644700234 28 rs4415084 rs6874055 1 1 3.35E−36 44702722 29 rs4415084rs4419600 1 1 1.28E−36 44714291 30 rs4415084 rs12187196 1 1 1.28E−3644719576 31 rs4415084 rs12522626 1 1 3.38E−36 44721455 32 rs4415084rs4571480 1 1 5.45E−36 44722945 33 rs4415084 rs6451770 1 1 1.28E−3644727152 34 rs4415084 rs12515012 1 1 1.28E−36 44730292 35 rs4415084rs2165009 1 1 2.08E−36 44733673 36 rs4415084 rs13156930 1 1 1.28E−3644733792 37 rs4415084 rs920328 1 0.93135 1.41E−32 44734808 38 rs4415084rs1821936 1 1 2.08E−36 44735239 39 rs4415084 rs714130 1 1 1.28E−3644737175 40 rs4415084 rs2013513 1 1 5.45E−36 44738063 41 rs4415084rs920329 1 1 2.71E−36 44738264 42 rs4415084 rs2218081 1 1 1.28E−3644740897 43 rs4415084 rs10941679 1 0.512661 2.03E−17 44742255 44rs4415084 rs2165010 1 1 3.35E−36 44742537 45 rs4415084 rs1438825 1 11.28E−36 44742688 46 rs4415084 rs6861560 1 1 1.28E−36 44744135 47rs4415084 rs16901937 1 0.965497 1.66E−34 44744898 48 rs4415084 rs22180800.964747 0.930737 2.74E−31 44750087 49 rs4415084 rs11747159 0.9208910.708022 7.34E−22 44773467 50 rs4415084 rs2330572 0.889429 0.7367767.58E−23 44776746 51 rs4415084 rs994793 0.889429 0.736776 7.58E−2344779004 52 rs4415084 rs1438827 0.884157 0.677153 1.15E−20 44787713 53rs4415084 rs11949847 0.922578 0.766182 1.05E−23 44787926 54 rs4415084rs7712949 0.920891 0.708022 7.34E−22 44806102 55 rs4415084 rs131547810.920891 0.708022 7.34E−22 44810784 56 rs4415084 rs11746980 0.9245730.767952 4.51E−24 44813635 57 rs4415084 rs7711697 0.924114 0.7641014.56E−23 44816160 58 rs4415084 rs16901964 0.920891 0.708022 7.34E−2244819012 59 rs4415084 rs6875933 0.920497 0.707417 1.46E−21 44822453 60rs4415084 rs727305 0.920891 0.708022 7.34E−22 44831799 61 rs4415084rs13177711 0.923589 0.767079 6.86E−24 44832719 62 rs4415084 rs14388200.924573 0.767952 4.51E−24 44833527 63 rs4415084 rs1438819 0.9208910.708022 7.34E−22 44833603 64 rs4415084 rs12651949 0.910999 0.6648414.20E−16 44833869 65 rs4415084 rs10462080 0.920891 0.708022 7.34E−2244834809 66 rs4415084 rs10462081 0.920891 0.708022 7.34E−22 44836422 67rs4415084 rs13183209 0.920891 0.708022 7.34E−22 44839506 68 rs4415084rs6872254 0.920891 0.708022 7.34E−22 44839541 69 rs4415084 rs77174590.920891 0.708022 7.34E−22 44840282 70 rs4415084 rs13159598 0.9245730.767952 4.51E−24 44841683 71 rs4415084 rs3761648 0.919139 0.7019855.44E−21 44843836 72 rs4415084 rs3747479 0.919139 0.701985 5.44E−2144844919 73 rs4415084 rs1866406 0.920891 0.708022 7.34E−22 44845702 74rs4415084 rs13174122 0.920891 0.708022 7.34E−22 44846497 75 rs4415084rs11746506 0.920891 0.708022 7.34E−22 44848323 76 rs4415084 rs121888710.918911 0.679189 7.59E−21 44849761 77 rs4415084 rs11741772 0.9177930.698764 8.18E−21 44850354 78 rs4415084 rs7716571 0.923792 0.7622622.31E−23 44852741 79 rs4415084 rs7720787 0.924573 0.767952 4.51E−2444853066 80 rs4415084 rs9637783 0.920891 0.708022 7.34E−22 44855403 81rs4415084 rs1061310 0.924573 0.767952 4.51E−24 44856607 82 rs4415084rs4457089 0.920891 0.708022 7.34E−22 44857493 83 rs4415084 rs131891200.920891 0.708022 7.34E−22 44858040 84 rs4415084 rs930395 1 0.4021745.87E−14 44858215 85 rs4415084 rs10512865 0.924573 0.767952 4.51E−2444859124 86 rs4415084 rs6867533 0.924573 0.767952 4.51E−24 44863049 87rs4415084 rs6868232 0.920271 0.698768 4.63E−21 44863437 88 rs4415084rs12513749 0.920694 0.707719 1.47E−21 44863960 89 rs4415084 rs125188510.910529 0.687491 1.15E−18 44863988 90 rs4415084 rs1048758 0.9245730.767952 4.51E−24 44864351 91 rs4415084 rs13155698 0.920891 0.7080227.34E−22 44864438 92 rs4415084 rs13160259 0.924573 0.767952 4.51E−2444864721 93 rs4415084 rs6896350 0.920891 0.708022 7.34E−22 44868328 94rs4415084 rs1371025 0.91986 0.707156 1.09E−21 44869990 95 rs4415084rs4596389 0.920891 0.708022 7.34E−22 44872313 96 rs4415084 rs64517750.920891 0.708022 7.34E−22 44872545 97 rs4415084 rs7380559 0.9245730.767952 4.51E−24 44872767 98 rs4415084 rs729599 0.920891 0.7080227.34E−22 44878017 99 rs4415084 rs987394 0.920891 0.708022 7.34E−2244882135 100 rs4415084 rs7715731 0.917461 0.686969 6.41E−20 44882601 101rs4415084 rs4440370 0.920891 0.708022 7.34E−22 44889109 102 rs4415084rs4492119 0.920497 0.707417 1.46E−21 44891371 103 rs4415084 rs77034970.920891 0.708022 7.34E−22 44892785 104 rs4415084 rs6451778 0.9208910.708022 7.34E−22 44893745 105 rs4415084 rs13362132 0.924573 0.7679524.51E−24 44894017 106 rs4415084 rs1438821 0.922391 0.762608 2.84E−2344894208 107 rs4415084 rs1438822 0.920891 0.708022 7.34E−22 44894929 108rs4415084 rs4373287 0.920891 0.708022 7.34E−22 44898641 109 rs4415084rs6871052 0.919658 0.706847 2.18E−21 44899074 110 rs4415084 rs68933190.920891 0.708022 7.34E−22 44899486 111 rs4415084 rs10053247 0.9208910.708022 7.34E−22 44899716 112 rs4415084 rs10040082 0.957789 0.7390479.28E−23 44901611 113 rs4415084 rs10057521 0.920891 0.708022 7.34E−2244901743 114 rs4415084 rs10065638 0.920891 0.708022 7.34E−22 44901919115 rs4415084 rs6894324 0.919555 0.702623 2.73E−21 44903093 116rs4415084 rs4395640 0.920891 0.708022 7.34E−22 44904857 117 rs4415084rs10070037 0.920891 0.708022 7.34E−22 44905994 118 rs4415084 rs45184090.924573 0.767952 4.51E−24 44906609 119 rs4415084 rs9292913 0.9235890.767079 6.86E−24 44906636 120 rs4415084 rs9292914 0.906242 0.696712.34E−18 44907138 121 rs4415084 rs10059086 0.920891 0.708022 7.34E−2244907764 122 rs4415084 rs11951760 0.914383 0.695917 2.65E−20 44907929123 rs4415084 rs4329028 0.924573 0.767952 4.51E−24 44908110 124rs4415084 rs7716600 1 0.391985 1.56E−13 44910762 125 rs4415084 rs44121230.924573 0.767952 4.51E−24 44912045 126 rs4415084 rs7705343 0.9245730.767952 4.51E−24 44915334 127 rs4415084 rs10040488 0.920891 0.7080227.34E−22 44916045 128 rs4415084 rs4642377 0.920589 0.703488 1.84E−2144920997 129 rs4415084 rs4391175 0.920891 0.708022 7.34E−22 44925813 130rs4415084 rs4129642 0.920891 0.708022 7.34E−22 44933886 131 rs4415084rs9790879 0.924573 0.767952 4.51E−24 44935642 132 rs4415084 rs97908960.88242 0.702742 3.43E−21 44935848 133 rs4415084 rs4457088 0.9195550.702623 2.73E−21 44936711 134 rs4415084 rs4866784 0.920891 0.7080227.34E−22 44936888 135 rs4415084 rs9791056 0.920891 0.708022 7.34E−2244939648 136 rs4415084 rs6880275 0.920891 0.708022 7.34E−22 44944692 137rs4415084 rs6870136 0.920891 0.708022 7.34E−22 44946419 138 rs4415084rs6881563 0.920891 0.708022 7.34E−22 44948610 139 rs4415084 rs77036180.920891 0.708022 7.34E−22 44950336 140 rs4415084 rs10077814 0.9245730.767952 4.51E−24 44952546 141 rs4415084 rs6451783 0.920891 0.7080227.34E−22 44954050 142 rs4415084 rs4298259 0.920891 0.708022 7.34E−2244956468 143 rs4415084 rs7736092 0.91986 0.707156 1.09E−21 44956752 144rs4415084 rs7728431 0.920232 0.705178 1.49E−21 44958436 145 rs4415084rs7708506 0.918803 0.706267 1.62E−21 44958461 146 rs4415084 rs100398660.920891 0.708022 7.34E−22 44960818 147 rs4415084 rs10043344 0.9223770.762771 2.62E−23 44962275 148 rs4415084 rs10038554 0.919555 0.7026232.73E−21 44962864 149 rs4415084 rs10044096 0.924573 0.767952 4.51E−2444963122 150 rs4415084 rs10041518 0.920891 0.708022 7.34E−22 44963163151 rs4415084 rs12517690 0.920891 0.708022 7.34E−22 44975050 152rs4415084 rs6875287 0.917171 0.700475 1.83E−20 44977387 153 rs4415084rs11958808 0.924573 0.767952 4.51E−24 44980847 154 rs4415084 rs39350860.904005 0.519752 1.24E−15 44996680 155 rs4415084 rs3935213 0.8504410.267659 2.47E−08 44997201 156 rs4415084 rs4460145 0.8543 0.2783881.01E−08 45004083 157 rs4415084 rs6869488 0.8543 0.278388 1.01E−0845006273 158 rs4415084 rs6866995 0.847224 0.259219 3.63E−08 45012604 159rs4415084 rs2067980 1 0.265513 1.01E−09 45018074 160 rs4415084 rs42968100.836134 0.238566 2.48E−07 45019919 161 rs4415084 rs7709661 0.8342450.243662 4.06E−07 45039846 162 rs4415084 rs6894974 0.839261 0.2547871.68E−07 45056288 163 rs4415084 rs4533894 0.841387 0.255659 1.41E−0745060826 164 rs4415084 rs4371761 0.83644 0.244535 3.42E−07 45061977 165rs4415084 rs7716101 0.841387 0.255659 1.41E−07 45065624 166 rs4415084rs7731099 0.841176 0.262636 1.12E−07 45073783 167 rs4415084 rs77016790.828648 0.232155 9.87E−07 45078551 168 rs4415084 rs12522398 0.824430.225046 8.11E−07 45085230 169 rs4415084 rs4502832 0.836134 0.2385662.48E−07 45087138 170 rs4415084 rs11948186 0.916552 0.65068 6.95E−2045087191 171 rs4415084 rs12054976 0.841387 0.255659 1.41E−07 45093077172 rs4415084 rs4485937 0.836134 0.238566 2.48E−07 45101400 173rs4415084 rs4389695 0.836134 0.238566 2.48E−07 45107668 174 rs4415084rs13183434 1 0.265513 1.01E−09 45110390 175 rs4415084 rs125216390.841387 0.255659 1.41E−07 45114238 176 rs4415084 rs10051592 0.9165520.65068 6.95E−20 45126063 177 rs4415084 rs6885307 0.841387 0.2556591.41E−07 45130260 178 rs4415084 rs10805692 0.836134 0.238566 2.48E−0745135215 179 rs4415084 rs10941692 0.836134 0.238566 2.48E−07 45135535180

TABLE 13 Surrogate SNP markers for marker rs10941679. Markers withvalues of r² greater than 0.2 to rs10941679 in the HapMap CEU dataset(http://www.hapmap.org) in a 1 Mb interval flanking the marker wereselected. Shown is the name of the correlated SNP, values for r² and D′to rs10941679, and the corresponding P-value, as well as the position ofthe surrogate marker in NCBI Build 36 and a reference to the sequence idcontaining flanking sequnces for the marker. Discovery Pos in Bld SNPCorr SNP R² D′ P-value 36 SEQ ID NO: rs10941679 rs10473354 0.6979830.234357 3.79E−07 44432110 181 rs10941679 rs12054807 0.712181 0.2510519.37E−08 44433098 182 rs10941679 rs10941665 0.711093 0.248052 1.16E−0744434453 183 rs10941679 rs7356597 0.701296 0.242799 2.48E−07 44435967184 rs10941679 rs2200123 0.700877 0.292463 6.90E−08 44444748 185rs10941679 rs10472394 0.71647 0.263453 5.64E−08 44445489 186 rs10941679rs10055789 0.715333 0.260073 5.92E−08 44446940 187 rs10941679 rs100559530.714293 0.257042 7.33E−08 44447011 188 rs10941679 rs2330551 0.7153330.260073 5.92E−08 44448702 189 rs10941679 rs987852 0.712181 0.2510519.37E−08 44450245 190 rs10941679 rs1482668 0.712181 0.251051 9.37E−0844450407 191 rs10941679 rs2877162 0.715333 0.260073 5.92E−08 44451149192 rs10941679 rs2877163 0.711093 0.248052 1.16E−07 44451226 193rs10941679 rs2330553 0.712181 0.251051 9.37E−08 44451426 194 rs10941679rs1482667 0.704607 0.251717 1.60E−07 44452403 195 rs10941679 rs42421120.712181 0.251051 9.37E−08 44452490 196 rs10941679 rs1384451 0.7667580.281037 1.07E−08 44455011 197 rs10941679 rs1482685 0.766758 0.2810371.07E−08 44456232 198 rs10941679 rs13357659 0.766758 0.281037 1.07E−0844468642 199 rs10941679 rs6893590 0.919347 0.220962 1.03E−07 44487227200 rs10941679 rs8180484 0.915674 0.20433 2.43E−07 44507720 201rs10941679 rs1384450 0.915674 0.20433 2.43E−07 44515901 202 rs10941679rs10941667 0.915899 0.20528 2.44E−07 44530438 203 rs10941679 rs169018900.628697 0.218876 3.40E−06 44548272 204 rs10941679 rs2128434 0.9133860.20415 4.85E−07 44549566 205 rs10941679 rs2128435 0.917481 0.2122031.43E−07 44552968 206 rs10941679 rs4866777 0.919213 0.220312 8.32E−0844574747 207 rs10941679 rs1482698 0.830339 0.353461 9.52E−11 44575210208 rs10941679 rs4866902 0.917008 0.219257 1.66E−07 44580477 209rs10941679 rs10805684 0.919213 0.220312 8.32E−08 44587002 210 rs10941679rs7708449 0.664362 0.242956 3.43E−07 44604983 211 rs10941679 rs77131390.661966 0.237133 5.11E−07 44605617 212 rs10941679 rs10462078 0.6643620.242956 3.43E−07 44621291 213 rs10941679 rs7448715 0.664362 0.2429563.43E−07 44621309 214 rs10941679 rs4866911 0.664362 0.242956 3.43E−0744622497 215 rs10941679 rs4392631 0.664362 0.242956 3.43E−07 44628924216 rs10941679 rs4866779 0.668197 0.252735 2.11E−07 44659107 217rs10941679 rs11952948 0.662849 0.246642 4.89E−07 44663041 218 rs10941679rs4463188 1 0.510791 1.02E−16 44678427 19 rs10941679 rs4321755 10.512661 2.03E−17 44681952 20 rs10941679 rs4492118 1 0.509261 2.69E−1744682382 21 rs10941679 rs4613718 1 0.235547 4.63E−10 44685701 22rs10941679 rs7735881 1 0.512661 2.03E−17 44685933 23 rs10941679rs7723539 1 0.512661 2.03E−17 44695967 24 rs10941679 rs10805685 10.509261 2.69E−17 44697715 25 rs10941679 rs10941677 1 0.509261 7.08E−1744698156 26 rs10941679 rs4415084 1 0.512661 2.03E−17 44698272 219rs10941679 rs4415085 1 0.512661 2.03E−17 44698716 27 rs10941679rs7720551 1 0.512661 2.03E−17 44700234 28 rs10941679 rs6874055 10.512661 5.36E−17 44702722 29 rs10941679 rs4419600 1 0.512661 2.03E−1744714291 30 rs10941679 rs12187196 1 0.512661 2.03E−17 44719576 31rs10941679 rs12522626 1 0.505814 3.57E−17 44721455 32 rs10941679rs4571480 1 0.509261 7.08E−17 44722945 33 rs10941679 rs6451770 10.512661 2.03E−17 44727152 34 rs10941679 rs12515012 1 0.512661 2.03E−1744730292 35 rs10941679 rs2165009 1 0.509261 2.69E−17 44733673 36rs10941679 rs13156930 1 0.512661 2.03E−17 44733792 37 rs10941679rs920328 1 0.55045 2.49E−18 44734808 38 rs10941679 rs1821936 1 0.5092612.69E−17 44735239 39 rs10941679 rs714130 1 0.512661 2.03E−17 44737175 40rs10941679 rs2013513 1 0.509261 7.08E−17 44738063 41 rs10941679 rs9203291 0.509261 2.69E−17 44738264 42 rs10941679 rs2218081 1 0.512661 2.03E−1744740897 43 rs10941679 rs10941679 1 1 — 44742255 236 rs10941679rs2165010 1 0.512661 5.36E−17 44742537 45 rs10941679 rs1438825 10.512661 2.03E−17 44742688 46 rs10941679 rs6861560 1 0.512661 2.03E−1744744135 47 rs10941679 rs16901937 1 0.494973 5.47E−17 44744898 48rs10941679 rs2218080 0.943571 0.456436 5.90E−14 44750087 49 rs10941679rs11747159 0.841573 0.434894 2.26E−12 44773467 50 rs10941679 rs23305720.834953 0.383744 3.61E−11 44776746 51 rs10941679 rs994793 0.8349530.383744 3.61E−11 44779004 52 rs10941679 rs1438827 0.839428 0.4170315.91E−12 44787713 53 rs10941679 rs11949847 0.836361 0.39369 2.42E−1144787926 54 rs10941679 rs7712949 0.841573 0.434894 2.26E−12 44806102 55rs10941679 rs13154781 0.841573 0.434894 2.26E−12 44810784 56 rs10941679rs11746980 0.837222 0.4 1.49E−11 44813635 57 rs10941679 rs77116970.831574 0.396173 3.79E−11 44816160 58 rs10941679 rs16901964 0.8415730.434894 2.26E−12 44819012 59 rs10941679 rs6875933 0.841573 0.4348942.26E−12 44822453 60 rs10941679 rs727305 0.841573 0.434894 2.26E−1244831799 61 rs10941679 rs13177711 0.836797 0.396866 1.90E−11 44832719 62rs10941679 rs1438820 0.837222 0.4 1.49E−11 44833527 63 rs10941679rs1438819 0.841573 0.434894 2.26E−12 44833603 64 rs10941679 rs126519490.797511 0.412815 1.53E−08 44833869 65 rs10941679 rs10462080 0.8415730.434894 2.26E−12 44834809 66 rs10941679 rs10462081 0.841573 0.4348942.26E−12 44836422 67 rs10941679 rs13183209 0.841573 0.434894 2.26E−1244839506 68 rs10941679 rs6872254 0.841573 0.434894 2.26E−12 44839541 69rs10941679 rs7717459 0.841573 0.434894 2.26E−12 44840282 70 rs10941679rs13159598 0.837222 0.4 1.49E−11 44841683 71 rs10941679 rs37616480.843009 0.44764 1.34E−12 44843836 72 rs10941679 rs3747479 0.8430090.44764 1.34E−12 44844919 73 rs10941679 rs1866406 0.841573 0.4348942.26E−12 44845702 74 rs10941679 rs13174122 0.841573 0.434894 2.26E−1244846497 75 rs10941679 rs11746506 0.841573 0.434894 2.26E−12 44848323 76rs10941679 rs12188871 0.843661 0.453652 8.25E−13 44849761 77 rs10941679rs11741772 0.832079 0.420989 1.87E−11 44850354 78 rs10941679 rs77165710.8282 0.392326 7.08E−11 44852741 79 rs10941679 rs7720787 0.837222 0.41.49E−11 44853066 80 rs10941679 rs9637783 0.841573 0.434894 2.26E−1244855403 81 rs10941679 rs1061310 0.837222 0.4 1.49E−11 44856607 82rs10941679 rs4457089 0.841573 0.434894 2.26E−12 44857493 83 rs10941679rs13189120 0.841573 0.434894 2.26E−12 44858040 84 rs10941679 rs930395 10.784483 4.20E−22 44858215 85 rs10941679 rs10512865 0.837222 0.41.49E−11 44859124 86 rs10941679 rs6867533 0.837222 0.4 1.49E−11 4486304987 rs10941679 rs6868232 0.830195 0.426748 1.62E−11 44863437 88rs10941679 rs12513749 0.840368 0.43365 4.47E−12 44863960 89 rs10941679rs12518851 0.830617 0.419161 5.99E−11 44863988 90 rs10941679 rs10487580.837222 0.4 1.49E−11 44864351 91 rs10941679 rs13155698 0.8415730.434894 2.26E−12 44864438 92 rs10941679 rs13160259 0.837222 0.41.49E−11 44864721 93 rs10941679 rs6896350 0.841573 0.434894 2.26E−1244868328 94 rs10941679 rs1371025 0.841219 0.431852 2.88E−12 44869990 95rs10941679 rs4596389 0.841573 0.434894 2.26E−12 44872313 96 rs10941679rs6451775 0.841573 0.434894 2.26E−12 44872545 97 rs10941679 rs73805590.837222 0.4 1.49E−11 44872767 98 rs10941679 rs729599 0.841573 0.4348942.26E−12 44878017 99 rs10941679 rs987394 0.841573 0.434894 2.26E−1244882135 100 rs10941679 rs7715731 0.816216 0.41132 1.82E−10 44882601 101rs10941679 rs4440370 0.841573 0.434894 2.26E−12 44889109 102 rs10941679rs4492119 0.841573 0.434894 2.26E−12 44891371 103 rs10941679 rs77034970.841573 0.434894 2.26E−12 44892785 104 rs10941679 rs6451778 0.8415730.434894 2.26E−12 44893745 105 rs10941679 rs13362132 0.837222 0.41.49E−11 44894017 106 rs10941679 rs1438821 0.828917 0.387517 5.73E−1144894208 107 rs10941679 rs1438822 0.841573 0.434894 2.26E−12 44894929108 rs10941679 rs4373287 0.841573 0.434894 2.26E−12 44898641 109rs10941679 rs6871052 0.839995 0.430591 5.70E−12 44899074 110 rs10941679rs6893319 0.841573 0.434894 2.26E−12 44899486 111 rs10941679 rs100532470.841573 0.434894 2.26E−12 44899716 112 rs10941679 rs10040082 0.842670.444569 1.72E−12 44901611 113 rs10941679 rs10057521 0.841573 0.4348942.26E−12 44901743 114 rs10941679 rs10065638 0.841573 0.434894 2.26E−1244901919 115 rs10941679 rs6894324 0.835715 0.427926 7.66E−12 44903093116 rs10941679 rs4395640 0.841573 0.434894 2.26E−12 44904857 117rs10941679 rs10070037 0.841573 0.434894 2.26E−12 44905994 118 rs10941679rs4518409 0.837222 0.4 1.49E−11 44906609 119 rs10941679 rs92929130.836797 0.396866 1.90E−11 44906636 120 rs10941679 rs9292914 0.9242230.446215 9.11E−11 44907138 121 rs10941679 rs10059086 0.841573 0.4348942.26E−12 44907764 122 rs10941679 rs11951760 0.885323 0.466609 1.29E−1244907929 123 rs10941679 rs4329028 0.837222 0.4 1.49E−11 44908110 124rs10941679 rs7716600 1 0.7772 1.77E−21 44910762 125 rs10941679 rs44121230.837222 0.4 1.49E−11 44912045 126 rs10941679 rs7705343 0.837222 0.41.49E−11 44915334 127 rs10941679 rs10040488 0.841573 0.434894 2.26E−1244916045 128 rs10941679 rs4642377 0.836082 0.43096 6.03E−12 44920997 129rs10941679 rs4391175 0.841573 0.434894 2.26E−12 44925813 130 rs10941679rs4129642 0.841573 0.434894 2.26E−12 44933886 131 rs10941679 rs97908790.837222 0.4 1.49E−11 44935642 132 rs10941679 rs9790896 0.7807570.340217 9.70E−10 44935848 133 rs10941679 rs4457088 0.835715 0.4279267.66E−12 44936711 134 rs10941679 rs4866784 0.841573 0.434894 2.26E−1244936888 135 rs10941679 rs9791056 0.841573 0.434894 2.26E−12 44939648136 rs10941679 rs6880275 0.841573 0.434894 2.26E−12 44944692 137rs10941679 rs6870136 0.841573 0.434894 2.26E−12 44946419 138 rs10941679rs6881563 0.841573 0.434894 2.26E−12 44948610 139 rs10941679 rs77036180.841573 0.434894 2.26E−12 44950336 140 rs10941679 rs10077814 0.8372220.4 1.49E−11 44952546 141 rs10941679 rs6451783 0.841573 0.4348942.26E−12 44954050 142 rs10941679 rs4298259 0.841573 0.434894 2.26E−1244956468 143 rs10941679 rs7736092 0.841219 0.431852 2.88E−12 44956752144 rs10941679 rs7728431 0.838562 0.431223 4.37E−12 44958436 145rs10941679 rs7708506 0.840855 0.428766 3.67E−12 44958461 146 rs10941679rs10039866 0.841573 0.434894 2.26E−12 44960818 147 rs10941679 rs100433440.83823 0.407626 1.23E−11 44962275 148 rs10941679 rs10038554 0.8357150.427926 7.66E−12 44962864 149 rs10941679 rs10044096 0.837222 0.41.49E−11 44963122 150 rs10941679 rs10041518 0.841573 0.434894 2.26E−1244963163 151 rs10941679 rs12517690 0.841573 0.434894 2.26E−12 44975050152 rs10941679 rs6875287 0.833288 0.419968 3.75E−11 44977387 153rs10941679 rs11958808 0.837222 0.4 1.49E−11 44980847 154 rs10941679rs2067980 0.767084 0.304748 4.34E−08 45018074 160 rs10941679 rs119481860.634593 0.266543 1.36E−07 45087191 171 rs10941679 rs13183434 0.6854810.243358 9.99E−07 45110390 175 rs10941679 rs10051592 0.634593 0.2665431.36E−07 45126063 177

TABLE 14 Surrogate SNP markers for marker rs1219648. Markers with valuesof r² greater than 0.2 to rs1219648 in the HapMap CEU dataset(http://www.hapmap.org) in a 1 Mb interval flanking the marker wereselected. Shown is the name of the correlated SNP, values for r² and D′to rs1219648, and the corresponding P-value, as well as the position ofthe surrogate marker in NCBI Build 36 and a reference to the sequence idcontaining flanking sequnces for the marker. Discovery Pos in Bld SNPCorr SNP R² D′ P-value 36 SEQ ID NO: rs1219648 rs3750817 1 0.4878055.60E−18 123322567 220 rs1219648 rs11200014 1 0.964392 9.67E−33123324920 221 rs1219648 rs2912780 1 0.965418 2.47E−34 123327107 222rs1219648 rs2981579 1 0.965418 2.47E−34 123327325 223 rs1219648rs1078806 1 0.966387 3.53E−35 123328965 224 rs1219648 rs2981578 10.844156 5.28E−30 123330301 225 rs1219648 rs1219648 1 1 — 123336180 237rs1219648 rs1219643 1 0.272727 6.24E−10 123338345 226 rs1219648rs2912774 1 1 2.49E−37 123338652 227 rs1219648 rs2936870 1 1 2.49E−37123338892 228 rs1219648 rs17102287 1 0.446154 8.77E−16 123340181 229rs1219648 rs2860197 1 1 7.38E−37 123341292 230 rs1219648 rs2420946 1 11.03E−36 123341314 231 rs1219648 rs2981582 1 1 2.49E−37 123342307 232rs1219648 rs3135715 1 0.426087 3.82E−15 123344716 233 rs1219648rs1047111 0.90484 0.215627 5.80E−07 123347551 234

1-49. (canceled)
 50. A method for determining an increasedsusceptibility to breast cancer in a human individual who has not beendiagnosed with breast cancer, comprising: analyzing a nucleic acidsample obtained from the individual to detect the presence of allele Gof polymorphic marker rs10941679 in the nucleic acid sample, determiningan increased genetic susceptibility to breast cancer in the individualfrom the presence of the allele in the nucleic acid sample, andperforming at least one of clinical breast examination (CBE), X-raymammography, and contrast-enhanced magnetic resonance imaging (CE-MRI)in the individual determined to have the increased geneticsusceptibility to breast cancer.
 51. The method according to claim 50,wherein the determining an increased genetic susceptibility includescalculating a risk measure for the human individual that includes arelative risk (RR) or odds ratio (OR) of at least 1.10 attributable toallele G of polymorphic marker rs10941679 being present in the nucleicacid sample.
 52. The method of claim 50, further comprising analyzingnon-genetic information to make risk assessment, diagnosis, or prognosisof the individual.
 53. The method of claim 52, wherein the non-geneticinformation is selected from age, gender, ethnicity, socioeconomicstatus, previous disease diagnosis, medical history of subject, familyhistory of breast cancer, biochemical measurements, and clinicalmeasurements.
 54. A method of determining risk of developing at least asecond primary breast tumor in a human individual previously diagnosedwith breast cancer, the method comprising: analyzing a nucleic acidsample obtained from the individual to detect the presence of allele Gof polymorphic marker rs10941679 in a nucleic acid sample obtained fromthe individual, determining an increased genetic risk of developing atleast a second primary breast tumor in the individual previouslydiagnosed with breast cancer from the presence of the allele in thenucleic acid sample, and performing at least one of clinical breastexamination (CBE), X-ray mammography, and contrast-enhanced magneticresonance imaging (CE-MRI) in the individual determined to have theincreased genetic susceptibility to breast cancer.
 55. The methodaccording to claim 50, wherein the analyzing of the nucleic acid samplecomprises amplifying a segment of a nucleic acid that comprises thepolymorphic marker by Polymerase Chain Reaction (PCR), using anucleotide primer pair flanking the polymorphic marker.
 56. The methodaccording to claim 50, wherein the analyzing of the nucleic acid sampleis performed using at least one process selected from allele-specificprobe hybridization, allele-specific primer extension, allele-specificamplification, nucleic acid sequencing, 5′-exonuclease digestion,molecular beacon assay, oligonucleotide ligation assay, size analysis,single-stranded conformation analysis and microarray technology.
 57. Themethod according to claim 56, wherein the process comprisesallele-specific probe hybridization.
 58. The method according to claim56, wherein the process comprises microarray technology.
 59. The methodaccording to claim 50, comprising: 1) contacting copies of the nucleicacid with a detection oligonucleotide probe and an enhanceroligonucleotide probe under conditions for specific hybridization of theoligonucleotide probe with the nucleic acid; wherein a) the detectionoligonucleotide probe is from 5-100 nucleotides in length andspecifically hybridizes to a first segment of a nucleic acid whosenucleotide sequence is given by SEQ ID NO: 236; b) the detectionoligonucleotide probe comprises a detectable label at its 3′ terminusand a quenching moiety at its 5′ terminus; c) the enhanceroligonucleotide is from 5-100 nucleotides in length and is complementaryto a second segment of the nucleotide sequence that is 5′ relative tothe oligonucleotide probe, such that the enhancer oligonucleotide islocated 3′ relative to the detection oligonucleotide probe when botholigonucleotides are hybridized to the nucleic acid; and d) a singlebase gap exists between the first segment and the second segment, suchthat when the oligonucleotide probe and the enhancer oligonucleotideprobe are both hybridized to the nucleic acid, a single base gap existsbetween the oligonucleotides; 2) treating the nucleic acid with anendonuclease that will cleave the detectable label from the 3′ terminusof the detection probe to release free detectable label when thedetection probe is hybridized to the nucleic acid; and 3) measuring freedetectable label, wherein the presence of the free detectable labelindicates that the detection probe specifically hybridizes to the firstsegment of the nucleic acid, and indicates the sequence of thepolymorphic site as the complement of the detection probe.
 60. Themethod according to claim 50, wherein the step of determining anincreased genetic susceptibility is performed with a computer using acomputer-readable medium on which is stored: an identifier forpolymorphic marker rs10941679; an indicator of the frequency of at leastone allele of polymorphic marker rs10941679 in a plurality ofindividuals diagnosed with breast cancer; and an indicator of thefrequency of the least one allele of said at least one polymorphicmarkers in a plurality of reference individuals.
 61. The methodaccording to claim 50, wherein the step of determining an increasedgenetic susceptibility is performed using an apparatus comprising: acomputer readable memory; a processor; and a routine stored on thecomputer readable memory and adapted to be executed on the processor toanalyze marker information for at least one human individual withrespect to at least one polymorphic marker that is rs10941679 andgenerate an output based on the marker information, wherein the outputcomprises an individual risk measure of the at least one marker as agenetic indicator of breast cancer susceptibility for the humanindividual.
 62. The method according to claim 61, wherein the routinefurther comprises a risk measure for breast cancer associated with theat least one marker, wherein the risk measure is based on a comparisonof the frequency of at least one allele of the at least one polymorphicmarker in a plurality of individuals diagnosed with breast cancer and anindicator of the frequency of the at least one allele of at least onepolymorphic marker in a plurality of reference individuals, and whereinthe individual risk measure for the human individual is based on acomparison of the carrier status of the individual for the at least onemarker and the risk measure for the at least one marker allele.
 63. Themethod according to claim 50, comprising determining that the individualis homozygous for allele G of polymorphic marker rs10941679, anddetermining increased susceptibility to breast cancer from the presenceof the homozygous G allele.
 64. The method according to claim 56,wherein the process comprises allele-specific probe hybridization ornucleic acid sequencing.
 65. The method according to claim 50 thatcomprises analyzing the nucleic acid sample by contacting nucleic acidfrom the sample with at least one oligonucleotide probe that is 15 to500 nucleotides in length and that hybridizes to a segment of a nucleicacid whose sequence is shown in SEQ ID NO: 236, or the complementthereof, wherein the hybridization is sequence-specific and identifiesthe presence or absence of allele G of rs10941679.
 66. The methodaccording to claim 50, comprising calculating a risk measure thatincludes a genetic susceptibility calculation based on the determinationof the presence of allele G of rs10941679.
 67. The method according toclaim 50, further comprising decreasing the interval between repeatedscreening in an individual determined to have the increased geneticsusceptibility to breast cancer, wherein the screening comprises the atleast one of CBE, X-ray mammography, and CE-MRI.
 68. A method of using anucleic acid sample isolated from a human individual who has not beendiagnosed with breast cancer, for determining an increasedsusceptibility to breast cancer in the individual, the methodcomprising: analyzing the nucleic acid sample to detect the presence ofallele G of polymorphic marker rs10941679, determining an increasedgenetic susceptibility to breast cancer for the human individual fromevidence that allele G of polymorphic marker rs10941679 is present inthe nucleic acid sample, and performing at least one of clinical breastexamination (CBE), X-ray mammography, and contrast-enhanced magneticresonance imaging (CE-MRI) in the individual determined to have theincreased genetic susceptibility to breast cancer.
 69. The method ofclaim 68, wherein determining an increased susceptibility includescalculating a risk measure for the human individual that includes agenetic risk component attributed to allele G of polymorphic markerrs10941679 being present in the nucleic acid sample from the individual.70. The method according to claim 68, wherein the analyzing of thenucleic acid sample is performed using at least one process selectedfrom allele-specific probe hybridization, allele-specific primerextension, allele-specific amplification, nucleic acid sequencing,5′-exonuclease digestion, molecular beacon assay, oligonucleotideligation assay, size analysis, single-stranded conformation analysis andmicroarray technology.
 71. A method for determining a susceptibility tobreast cancer in a human individual, the method comprising: analyzingnucleic acid from the human individual for the presence of at least oneallele of at least one polymorphic marker, wherein the at least oneallele comprises rs10941679, allele G, detecting the presence of alleleG of polymorphic marker rs10941679 in the sample, and determining anincreased genetic susceptibility to breast cancer for the humanindividual from the presence of the at least one allele in the nucleicacid, by calculating a risk measure for the individual that includes arelative risk or odds ratio of at least 1.1 attributed to allele G ofpolymorphic marker rs10941679 being present in the nucleic acid sample,wherein the step of determining the increased genetic susceptibility tobreast cancer is performed using an apparatus that comprises: a computerreadable memory, a processor, and a routine stored on the computerreadable memory; wherein the routine is adapted to be executed on theprocessor to analyze marker information for at least one humanindividual with respect to the polymorphic marker, and generate anoutput based on the marker information, wherein the output comprises abreast cancer risk measure of the allele as a genetic indicator of thebreast cancer condition for the human individual.